Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19)

24 Aug 2005

      Lee Passey wrote:
...
Joshua Hutchinson wrote:
...
...
While this is true, our tei files are specifically meant as a master
document and NOT as a viewing document.  They will NOT parse in any
browser "out of the box".  As you've seen, you can jury-rig things to
the point where it is usuable, but that is not our intention.  We 
provide the HTML files directly for people that want to browse the 
file in IE or Firefox.
...
I understand that creating a file format which could be viewed without
further processing was not your intention, but now that we have some
evidence that suggests that it is a real possiblity is there any reason
_not_ to pursue that possiblity, especially if it only requires adding
three lines to the source (and making sure that all the dtd's are 
accessible)?
Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI
Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is
also a real possibility. But I believe, subject to change as I learn
more from the experts here and the TEI-L folk, that in order to make
PGTEI+CSS2 to render in web standards browsers (limited now to Firefox
and maybe Opera 8) we also have to appropriately constrain/subset the
PG-TEI vocabulary (allowed elements/attributes/attr-values) and
content models (what results may be somewhat like TEI-Lite, but not
exactly the same -- we can certainly add our own tags as needs
require.) We may also have to give up a couple things.

[Note: Even if CSS2 rendering is not of interest, I think PG-TEI, when
released as version 1.0, needs to be appropriately constrained to make
life a whole lot easier for everyone using it -- subject of a future
message if this topic comes up.]

Assuming appropriate constraints, here's the five items needing
further investigation to see how to get them to render properly using
CSS2 (there may be other TEI constructs which don't fit well into the
XHTML model):

1) The TEI <note> tag. If placed directly inline (not indirectly
   referenced), it is possible in CSS2 to declare it block and move it
   outside of the main flow, which is a reasonable way to present it
   (even if not the best.) I've actually experimented with this, but
   my test files are inexplicably long-lost <fuming class="mad"/>.
   This won't work in IE6, but then IE6 sucks when it comes to web
   standards support. (I assume with XSLT that more advanced moving
   around of the content within notes is possible to do, such as
   dumping it into another document or placing it in a notes section.)

2) Hypertext links. CSS2 'display' provides no mapping for anchors.
   XLink will work, but then that's outside of TEI. (XLink for
   hypertext linking is recognized in Mozilla/Firefox, but not in
   Opera 7 -- don't know about Opera 8 yet. Try the following test:

      http://www.windspun.com/demoxml/demolink.xml

3) Tables. I think the basic TEI table model will map to the XHTML
   model (there's quite a few table-related CSS2 'display' values.)

   However, if PG-TEI will optionally allow other table models to
   be used, such as CALS, all bets are off. I'm not sure that even
   XSLT will be able to properly map any CALS table to XHTML (may
   require something outside of XSLT to do the transformation.)

4) Lists. I think that TEI Lists can be made to render properly with
   CSS2 'display', but not sure. It needs experimentation.

5) Images. CSS2 'display' has no mapping for images and objects. XLink
   provides the ability to embed objects, but no web browser appears to
   support this functionality of XLink yet, and anyway XLink will not
   be used to specify images in PG-TEI documents.

   (Hmmm, I think here it may be possible with CSS2 to pull out the
   name of the image and then use that name as a string to embed the
   image back in -- CSS2 is capable of image embedding. Need to
   experiment with it. It might work in IE6, too.)
...
...
I personally prefer numeric entities, as well, but for the more common
ones, the conversion process will support named entities in the .tei
file.  Most of them appear as unicode in the HTML, so it typically 
isn't an issue in the final product.
...
You are correct; so long as you are relying on conversion to HTML (or
some other file format) before the file is used, there should be no 
problem (so long as the conversion utility can get to the correct .ent
files). Use of named entities is only a problem if you are attempting to
display the TEI-XML directly.
Yes, definitely! Of course, those named character entities which are
defined in HTML/XHTML will be renderable in webs standards browsers.

But I think it best, in whatever DP exports as PG-TEI, to use numeric
character entities. For primarily "ASCII" documents, a manifest of
non-ASCII characters used in the document can be placed in a comment
somewhere in the header. This allows someone to know what ሴ
found in the text is (here it is an Ethiopic character), without
having to refer to the Unicode docs. I build a non-ASCII character
manifest for many of the XHTML documents I author.
...
In any case, it doesn't matter which encoding is used, so long as it is
not misrepresented in the <?xml ...> declaration.
Yes. To reply to Marcello's comment in another message, the PG-TEI
documentation should make it clear, and provide an example, of using
either ISO-8859-1 or UTF-8 in the XML declaration.

If it was my druthers, only UTF-8 should be used, but a compromise
where ISO-8859-1 can also be used is acceptable. But no others for all
mostly Latin documents! And I'd work at a future time to re-encode
documents in ISO-8859-1 into UTF-8.
...
...
We aren't too interested in CSS directly for the TEI file (the css 
file sitting beside the TEI file right now is a mistake ... that 
should be changed later today).  However, once I have a few more 
documents posted and people seem fairly satisfied with the results, I
want to get alternate CSS files submitted by other people for the HTML
documents.
...
Well, I might do it anyway for my own edification and enjoyment (and
because I think you _will_ be interested at some point in the future ;-).)
<laugh> Careful Lee, you almost sound like Bowerbird on that one (but
not quite.)

I think it is an excellent exercise to explore how to properly render 
XML-conforming TEI documents using only CSS2 in web standards browsers.
It may indicate how to constrain TEI so it is renderable, which may
be useful for the set of criteria to build the constrained PG-TEI
subset of TEI.

It is also useful for the proposed TEI support in OpenReader.
...
Some months ago I put together a couple of tables showing how HTML could
be mapped to TEI-lite, and vice-versa. The goal was to create a mapping
that could be used for round-tripping via XSLT; that is, a TEI-lite 
document could be used to create an HTML document which could then be
transformed back into TEI without loss of markup. I will probably start
from those tables in creating a tei.css file. They may also be useful to
you in creating XSLT scripts (aka XSL style sheets). If you're 
interested they can be found at
www.passkeysoft.com/~lee/xhtml2tei.html 
and www.passkeysoft.com/~lee/tei2xhtml.html.
Well, round-tripping using XSLT and direct rendering of TEI using CSS2
are two different things. I believe XSLT has more power, but CSS2 is
not bad, and CSS3 adds some new stuff (but mostly not supported in
Firefox and Opera.)
...
...
Also, if any industrious programmers out there know TEI conversions
and would like to tackle the job of preparing a conversion process for
other end formats (such as Palm files, Plucker, MS Reader, etc) please
let me and/or Marcello know.  The conversion must run on Linux (our
server OS) and be open source (for future compatibility).
...
To my knowledge there are no known lit compilers that run on Linux (thus
making them ineligble by your requirements). This is not really a big
deal because most MSReader users who are familiar with Project Gutenberg
are comfortable making .lit files from HTML themselves, so if you can
serve good HTML they will be happy.
My view in LIT production is to go from PG-TEI to well-structured
XHTML 1.1 (which is probably what Lee means by "HTML".) Then from
there build OEBPS 1.0.1 (LIT optimized) and OEBPS 1.2. Then let
end-users convert the OEBPS 1.0.1 to LIT using the simple
litconvertdemo in MS Reader's SDK (I have a "non-demo" version of the
same). This approach takes full advantage of what LIT provides, while
ReaderWorks does not (RW is buggy plus does not support a couple of
the Reader/LIT features.) That is, to produce the hightest quality LIT
having available the full range of Reader/LIT features, it is much
better to start with OEBPS 1.0.1 than to use ReaderWorks which
assembles HTML fragments.

Jon

Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19)

Jon Noring