Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!

23 Aug 2005

      Joshua wrote:
...
Lee Passey wrote:
...
...
As Mr. Noring is always quick to point out, XML files can be viewed
natively in both Firefox and IE6 when accompanied by appropriate style
sheets, so I attempted to open this file directly in both of these 
browsers.
...
While this is true, our tei files are specifically meant as a master
document and NOT as a viewing document.  They will NOT parse in any 
browser "out of the box".  As you've seen, you can jury-rig things to
the point where it is usuable, but that is not our intention.  We 
provide the HTML files directly for people that want to browse the file
in IE or Firefox.
One value in the direct viewing of PG-TEI documents is for checking
the markup -- to make sure the content is properly marked up (Lee
later brought up a specific example of incorrectly applied markup to
the particular PG-TEI document under discussion.)

For example, one could put together a "silly.css", using a variety
of text colors, font-styles, font-weights, etc., to highlight
certain structures and text semantics.

Another knotty issue is that TEI includes structural/semantic markup
that current HTML-based browsers don't know how to natively (without
CSS) handle or interpret properly (and even with the right CSS some
substandard browsers like IE6 can't be forced to handle properly.)

This includes the inline note tag -- HTML has never had an inline note
tag where it is assumed, even without CSS, the browser will pull the
note out of the main flow and present it separately (such as in a
popup window.) [HTML *should* have had this feature from the start but
that's water under the bridge -- XHTML 2.0 plans to include
functionality to allow this, so future browsers will have to be able,
without CSS, to extract certain inline stuff and render it outside the
main flow, such as in a popup window, to the side, or other means. My
kudos to the XHTML working group for implementing this!]
...
Also, we have had some backchannel discussion about how the web server
should serve the .tei files.  I think Marcello is going to change the
server to tell your browser that the .tei files is a mime encoding of
text so that it will display like a .txt file would.  This will help
prevent people from being confused when their browser tries to display
the file directly and fails miserably.
Good point! Another way around the issue is to simply zip up the TEI
document for download, and include a separate "readthisfirst.txt"
file describing what it is and how to directly render it if that is
of interest to the end-user.
...
...
Firefox does not have this problem, but Firefox also breaks when it
encounters named entities, even when the entities are referenced in
.ent files included from the dtd's, leading me to believe that Firefox
avoids the problems associated with "roaming dtd's" by simply not 
parsing them in the first place.
This is interesting. Didn't know this. I don't think Firefox has
concentrated on general XML rendering. Interestingly FF does support
a subset of XLink, thus it is possible, using XLink, to create
hypertext links in non-XHTML documents (with the full XLink, it is
possible to do other things, such as embed images, to be equivalent
to the HTML <img> and <object> tags.) I'll have to repeat this
experiment with Opera 8 to see if they've enabled some XLink stuff
(Opera 7 did not.)
...
...
It appears that the file is latin-1 encoded, despite the fact that the
DTD claims that it is utf-8 encoded. This caused Firefox some grief as
it tried to utf-8-decode some latin-1 accented vowels.
...
I may be wrong here (Marcello is my unicode guru), but I thought UTF-8
was a superset of Latin1?  Anyway, I know if this particular file there
are quite a few UTF-8 encoded characters (and a couple more that should
be that we found yesterday backchannel).
If what Lee refers to as "Latin-1" is ISO-8859, then Lee is right, it
is NOT correct to specify the document encoding as UTF-8 since they
are incompatible.

It is my personal view that ISO-8859 should never be used for the PG
masters -- UTF-8 should be used instead. That "7-bit" ASCII conforms
to UTF-8 is a nice bonus. (But ISO-8859-x, a.k.a. "8-bit ASCII" and
"Latin-1", does not conform to UTF-8.)
...
...
If you're interested, I'll start putting together a generic CSS file
for TEI.
...
We aren't too interested in CSS directly for the TEI file (the css file
sitting beside the TEI file right now is a mistake ... that should be
changed later today).  However, once I have a few more documents posted
and people seem fairly satisfied with the results, I want to get 
alternate CSS files submitted by other people for the HTML documents.
As noted above, I think a generic CSS file for PG-TEI would be a great
idea! It allows direct viewing of the master for errors, and the CSS
can be tweaked for direct viewing by end-users (probably restricted
to Firefox and Opera in order to handle inline notes, where the CSS
has to move the inline notes and similar stuff to a box outside of the
flow of the text, maybe highlighted in some way -- as noted above, IE6
chokes on this CSS2 stuff.)

Another issue of incompatibility, where CSS may break down, is that
the table model in TEI is different in some ways from the HTML table
model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI
include support for TEI tables? (I would assume it does.)
...
Also, if any industrious programmers out there know TEI conversions and
would like to tackle the job of preparing a conversion process for other
end formats (such as Palm files, Plucker, MS Reader, etc) please let me
and/or Marcello know.  The conversion must run on Linux (our server OS)
and be open source (for future compatibility).
For MS Reader, unless one wants to build an unapproved and possibly
illegal converter (since the LIT format has been cracked it is now
possible), one has to use Microsoft's litgen.dll to produce LIT files,
thus restricting the converter to MS Windows (litgen.dll requires, in
turn, MSXML for XML document parsing and validation.) Litgen takes as
input an OEBPS 1.0.1 Publication.

Now I do think it worthwhile to produce OEBPS as one of the output
formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion
into LIT so others may do so automatically), and OEBPS 1.2 (which is
the current OEBPS standard and is preferable.) Essentially, the process
works as follows:

PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s)

OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication

Inline notes would be handled by inserting an anchor link where the
note was, and pulling the note into a separate XHTML/OEBPS document.
The notes can either be aggregated into one document, or each be kept
in their own document. The OEBPS 1.x framework will easily handle
multiple documents that comprise one publication (it's very cool,
really, in how it works.)

Jon

(p.s., Lee, did you experiment with Opera 8? They have a full-featured
free version -- just have to put up with the ads in the free version.)

Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!

Jon Noring