
Joshua wrote:
Lee Passey wrote:
As Mr. Noring is always quick to point out, XML files can be viewed natively in both Firefox and IE6 when accompanied by appropriate style sheets, so I attempted to open this file directly in both of these browsers.
While this is true, our tei files are specifically meant as a master document and NOT as a viewing document. They will NOT parse in any browser "out of the box". As you've seen, you can jury-rig things to the point where it is usuable, but that is not our intention. We provide the HTML files directly for people that want to browse the file in IE or Firefox.
One value in the direct viewing of PG-TEI documents is for checking the markup -- to make sure the content is properly marked up (Lee later brought up a specific example of incorrectly applied markup to the particular PG-TEI document under discussion.) For example, one could put together a "silly.css", using a variety of text colors, font-styles, font-weights, etc., to highlight certain structures and text semantics. Another knotty issue is that TEI includes structural/semantic markup that current HTML-based browsers don't know how to natively (without CSS) handle or interpret properly (and even with the right CSS some substandard browsers like IE6 can't be forced to handle properly.) This includes the inline note tag -- HTML has never had an inline note tag where it is assumed, even without CSS, the browser will pull the note out of the main flow and present it separately (such as in a popup window.) [HTML *should* have had this feature from the start but that's water under the bridge -- XHTML 2.0 plans to include functionality to allow this, so future browsers will have to be able, without CSS, to extract certain inline stuff and render it outside the main flow, such as in a popup window, to the side, or other means. My kudos to the XHTML working group for implementing this!]
Also, we have had some backchannel discussion about how the web server should serve the .tei files. I think Marcello is going to change the server to tell your browser that the .tei files is a mime encoding of text so that it will display like a .txt file would. This will help prevent people from being confused when their browser tries to display the file directly and fails miserably.
Good point! Another way around the issue is to simply zip up the TEI document for download, and include a separate "readthisfirst.txt" file describing what it is and how to directly render it if that is of interest to the end-user.
Firefox does not have this problem, but Firefox also breaks when it encounters named entities, even when the entities are referenced in .ent files included from the dtd's, leading me to believe that Firefox avoids the problems associated with "roaming dtd's" by simply not parsing them in the first place.
This is interesting. Didn't know this. I don't think Firefox has concentrated on general XML rendering. Interestingly FF does support a subset of XLink, thus it is possible, using XLink, to create hypertext links in non-XHTML documents (with the full XLink, it is possible to do other things, such as embed images, to be equivalent to the HTML <img> and <object> tags.) I'll have to repeat this experiment with Opera 8 to see if they've enabled some XLink stuff (Opera 7 did not.)
It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels.
I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 was a superset of Latin1? Anyway, I know if this particular file there are quite a few UTF-8 encoded characters (and a couple more that should be that we found yesterday backchannel).
If what Lee refers to as "Latin-1" is ISO-8859, then Lee is right, it is NOT correct to specify the document encoding as UTF-8 since they are incompatible. It is my personal view that ISO-8859 should never be used for the PG masters -- UTF-8 should be used instead. That "7-bit" ASCII conforms to UTF-8 is a nice bonus. (But ISO-8859-x, a.k.a. "8-bit ASCII" and "Latin-1", does not conform to UTF-8.)
If you're interested, I'll start putting together a generic CSS file for TEI.
We aren't too interested in CSS directly for the TEI file (the css file sitting beside the TEI file right now is a mistake ... that should be changed later today). However, once I have a few more documents posted and people seem fairly satisfied with the results, I want to get alternate CSS files submitted by other people for the HTML documents.
As noted above, I think a generic CSS file for PG-TEI would be a great idea! It allows direct viewing of the master for errors, and the CSS can be tweaked for direct viewing by end-users (probably restricted to Firefox and Opera in order to handle inline notes, where the CSS has to move the inline notes and similar stuff to a box outside of the flow of the text, maybe highlighted in some way -- as noted above, IE6 chokes on this CSS2 stuff.) Another issue of incompatibility, where CSS may break down, is that the table model in TEI is different in some ways from the HTML table model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI include support for TEI tables? (I would assume it does.)
Also, if any industrious programmers out there know TEI conversions and would like to tackle the job of preparing a conversion process for other end formats (such as Palm files, Plucker, MS Reader, etc) please let me and/or Marcello know. The conversion must run on Linux (our server OS) and be open source (for future compatibility).
For MS Reader, unless one wants to build an unapproved and possibly illegal converter (since the LIT format has been cracked it is now possible), one has to use Microsoft's litgen.dll to produce LIT files, thus restricting the converter to MS Windows (litgen.dll requires, in turn, MSXML for XML document parsing and validation.) Litgen takes as input an OEBPS 1.0.1 Publication. Now I do think it worthwhile to produce OEBPS as one of the output formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion into LIT so others may do so automatically), and OEBPS 1.2 (which is the current OEBPS standard and is preferable.) Essentially, the process works as follows: PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s) OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication Inline notes would be handled by inserting an anchor link where the note was, and pulling the note into a separate XHTML/OEBPS document. The notes can either be aggregated into one document, or each be kept in their own document. The OEBPS 1.x framework will easily handle multiple documents that comprise one publication (it's very cool, really, in how it works.) Jon (p.s., Lee, did you experiment with Opera 8? They have a full-featured free version -- just have to put up with the ads in the free version.)