
Lee Passey wrote:
Joshua Hutchinson wrote:
While this is true, our tei files are specifically meant as a master document and NOT as a viewing document. They will NOT parse in any browser "out of the box". As you've seen, you can jury-rig things to the point where it is usuable, but that is not our intention. We provide the HTML files directly for people that want to browse the file in IE or Firefox.
I understand that creating a file format which could be viewed without further processing was not your intention, but now that we have some evidence that suggests that it is a real possiblity is there any reason _not_ to pursue that possiblity, especially if it only requires adding three lines to the source (and making sure that all the dtd's are accessible)?
Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is also a real possibility. But I believe, subject to change as I learn more from the experts here and the TEI-L folk, that in order to make PGTEI+CSS2 to render in web standards browsers (limited now to Firefox and maybe Opera 8) we also have to appropriately constrain/subset the PG-TEI vocabulary (allowed elements/attributes/attr-values) and content models (what results may be somewhat like TEI-Lite, but not exactly the same -- we can certainly add our own tags as needs require.) We may also have to give up a couple things. [Note: Even if CSS2 rendering is not of interest, I think PG-TEI, when released as version 1.0, needs to be appropriately constrained to make life a whole lot easier for everyone using it -- subject of a future message if this topic comes up.] Assuming appropriate constraints, here's the five items needing further investigation to see how to get them to render properly using CSS2 (there may be other TEI constructs which don't fit well into the XHTML model): 1) The TEI <note> tag. If placed directly inline (not indirectly referenced), it is possible in CSS2 to declare it block and move it outside of the main flow, which is a reasonable way to present it (even if not the best.) I've actually experimented with this, but my test files are inexplicably long-lost <fuming class="mad"/>. This won't work in IE6, but then IE6 sucks when it comes to web standards support. (I assume with XSLT that more advanced moving around of the content within notes is possible to do, such as dumping it into another document or placing it in a notes section.) 2) Hypertext links. CSS2 'display' provides no mapping for anchors. XLink will work, but then that's outside of TEI. (XLink for hypertext linking is recognized in Mozilla/Firefox, but not in Opera 7 -- don't know about Opera 8 yet. Try the following test: http://www.windspun.com/demoxml/demolink.xml 3) Tables. I think the basic TEI table model will map to the XHTML model (there's quite a few table-related CSS2 'display' values.) However, if PG-TEI will optionally allow other table models to be used, such as CALS, all bets are off. I'm not sure that even XSLT will be able to properly map any CALS table to XHTML (may require something outside of XSLT to do the transformation.) 4) Lists. I think that TEI Lists can be made to render properly with CSS2 'display', but not sure. It needs experimentation. 5) Images. CSS2 'display' has no mapping for images and objects. XLink provides the ability to embed objects, but no web browser appears to support this functionality of XLink yet, and anyway XLink will not be used to specify images in PG-TEI documents. (Hmmm, I think here it may be possible with CSS2 to pull out the name of the image and then use that name as a string to embed the image back in -- CSS2 is capable of image embedding. Need to experiment with it. It might work in IE6, too.)
I personally prefer numeric entities, as well, but for the more common ones, the conversion process will support named entities in the .tei file. Most of them appear as unicode in the HTML, so it typically isn't an issue in the final product.
You are correct; so long as you are relying on conversion to HTML (or some other file format) before the file is used, there should be no problem (so long as the conversion utility can get to the correct .ent files). Use of named entities is only a problem if you are attempting to display the TEI-XML directly.
Yes, definitely! Of course, those named character entities which are defined in HTML/XHTML will be renderable in webs standards browsers. But I think it best, in whatever DP exports as PG-TEI, to use numeric character entities. For primarily "ASCII" documents, a manifest of non-ASCII characters used in the document can be placed in a comment somewhere in the header. This allows someone to know what ሴ found in the text is (here it is an Ethiopic character), without having to refer to the Unicode docs. I build a non-ASCII character manifest for many of the XHTML documents I author.
In any case, it doesn't matter which encoding is used, so long as it is not misrepresented in the <?xml ...> declaration.
Yes. To reply to Marcello's comment in another message, the PG-TEI documentation should make it clear, and provide an example, of using either ISO-8859-1 or UTF-8 in the XML declaration. If it was my druthers, only UTF-8 should be used, but a compromise where ISO-8859-1 can also be used is acceptable. But no others for all mostly Latin documents! And I'd work at a future time to re-encode documents in ISO-8859-1 into UTF-8.
We aren't too interested in CSS directly for the TEI file (the css file sitting beside the TEI file right now is a mistake ... that should be changed later today). However, once I have a few more documents posted and people seem fairly satisfied with the results, I want to get alternate CSS files submitted by other people for the HTML documents.
Well, I might do it anyway for my own edification and enjoyment (and because I think you _will_ be interested at some point in the future ;-).)
<laugh> Careful Lee, you almost sound like Bowerbird on that one (but not quite.) I think it is an excellent exercise to explore how to properly render XML-conforming TEI documents using only CSS2 in web standards browsers. It may indicate how to constrain TEI so it is renderable, which may be useful for the set of criteria to build the constrained PG-TEI subset of TEI. It is also useful for the proposed TEI support in OpenReader.
Some months ago I put together a couple of tables showing how HTML could be mapped to TEI-lite, and vice-versa. The goal was to create a mapping that could be used for round-tripping via XSLT; that is, a TEI-lite document could be used to create an HTML document which could then be transformed back into TEI without loss of markup. I will probably start from those tables in creating a tei.css file. They may also be useful to you in creating XSLT scripts (aka XSL style sheets). If you're interested they can be found at www.passkeysoft.com/~lee/xhtml2tei.html and www.passkeysoft.com/~lee/tei2xhtml.html.
Well, round-tripping using XSLT and direct rendering of TEI using CSS2 are two different things. I believe XSLT has more power, but CSS2 is not bad, and CSS3 adds some new stuff (but mostly not supported in Firefox and Opera.)
Also, if any industrious programmers out there know TEI conversions and would like to tackle the job of preparing a conversion process for other end formats (such as Palm files, Plucker, MS Reader, etc) please let me and/or Marcello know. The conversion must run on Linux (our server OS) and be open source (for future compatibility).
To my knowledge there are no known lit compilers that run on Linux (thus making them ineligble by your requirements). This is not really a big deal because most MSReader users who are familiar with Project Gutenberg are comfortable making .lit files from HTML themselves, so if you can serve good HTML they will be happy.
My view in LIT production is to go from PG-TEI to well-structured XHTML 1.1 (which is probably what Lee means by "HTML".) Then from there build OEBPS 1.0.1 (LIT optimized) and OEBPS 1.2. Then let end-users convert the OEBPS 1.0.1 to LIT using the simple litconvertdemo in MS Reader's SDK (I have a "non-demo" version of the same). This approach takes full advantage of what LIT provides, while ReaderWorks does not (RW is buggy plus does not support a couple of the Reader/LIT features.) That is, to produce the hightest quality LIT having available the full range of Reader/LIT features, it is much better to start with OEBPS 1.0.1 than to use ReaderWorks which assembles HTML fragments. Jon