
Lee Passey wrote:
Congratulations on a worthwhile accomplishment.
Thanks!
I would like to point out, however, that this is _not_ Gutenberg's first XML posting; I believe there are hundreds of XHTML files currently available. You probably intended to say that this is Gutenberg's first TEI-XML posting. I know that this seems like picking at some pretty minor nits, but there are some people who believe that there is actually a text markup language called XML. XML is actually a syntax for creating markup languages, and there are many markup language available which conform to the XML syntax, e.g. XHTML, TEI, and DocBook. For clarity's sake it is probably desirable to always refer to a specific XML vocabulary, except when discussing the XML syntax which applies to all XML vocabularies equally.
We've had some back channel discussion on just how to name this and we've decided to change the extension to .tei to give a better indication of what the file is.
Some specific, and very preliminary observations:
As Mr. Noring is always quick to point out, XML files can be viewed natively in both Firefox and IE6 when accompanied by appropriate style sheets, so I attempted to open this file directly in both of these browsers.
While this is true, our tei files are specifically meant as a master document and NOT as a viewing document. They will NOT parse in any browser "out of the box". As you've seen, you can jury-rig things to the point where it is usuable, but that is not our intention. We provide the HTML files directly for people that want to browse the file in IE or Firefox. Also, we have had some backchannel discussion about how the web server should serve the .tei files. I think Marcello is going to change the server to tell your browser that the .tei files is a mime encoding of text so that it will display like a .txt file would. This will help prevent people from being confused when their browser tries to display the file directly and fails miserably.
Firefox does not have this problem, but Firefox also breaks when it encounters named entities, even when the entities are referenced in .ent files included from the dtd's, leading me to believe that Firefox avoids the problems associated with "roaming dtd's" by simply not parsing them in the first place. Numerical entities _are_ recognized, and rendered appropriately, as are named entities when the entity definition is contained in the XML file itself. I have no solution to this problem, except to suggest that named entities simply be avoided in favor of numeric entities, at least in the short term (I do note that the etext 16523-x.xml does not contain any named entities).
I personally prefer numeric entities, as well, but for the more common ones, the conversion process will support named entities in the .tei file. Most of them appear as unicode in the HTML, so it typically isn't an issue in the final product.
One of my pet peeves is the use of the <p> (paragraph) tag as a generic block tag, rather than limiting its use to true paragraphs, and using the <div> tag for generic blocks of text. I am happy to say that the text is mostly correct in this regard. The byline <p>by Bahá’u’lláh</p> should be marked using the <byline> tag instead of <p>; there may be other similar problems I simply haven't encountered yet.
You are correct. That'll get fixed today.
It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels.
I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 was a superset of Latin1? Anyway, I know if this particular file there are quite a few UTF-8 encoded characters (and a couple more that should be that we found yesterday backchannel).
If you're interested, I'll start putting together a generic CSS file for TEI.
We aren't too interested in CSS directly for the TEI file (the css file sitting beside the TEI file right now is a mistake ... that should be changed later today). However, once I have a few more documents posted and people seem fairly satisfied with the results, I want to get alternate CSS files submitted by other people for the HTML documents. Also, if any industrious programmers out there know TEI conversions and would like to tackle the job of preparing a conversion process for other end formats (such as Palm files, Plucker, MS Reader, etc) please let me and/or Marcello know. The conversion must run on Linux (our server OS) and be open source (for future compatibility). Josh