
Jim Tinsley wrote:
Nobody has an objection to valid TEI texts, but valid TEI texts alone _are not enough_. An XML file that cannot be read (by an actual human) is as useful as a lock with no key.
Not so. Having a TEI text posted would enable third-party developers to come up with their own converter solutions eve if we didn't get very far with ours. There are a lot of people around who already convert the text files into other formats. Their jobs would get much easier.
I really no longer give any headroom at all to the approach "Post XML Now Because That Is The One True Way And We'll Figure Out How To Read It Later." If for no other reason, then because the most important part of the WW job is to check the texts before posting, and if we can't read it, we can't find the errors, and if we can't find the errors, we can't fix 'em.
A TEI text is basically a text file. So you can read it in any editor. If you use emacs you can also validate the TEI file against the DTD without leaving the editor. A perfectly valid TEI file with no spelling errors should be good enough to post. What you expect from us TEI developers is that we produce the 150% perfect solution before you even consider starting to post files. That is not the way software development works. And this attitude is in my opinion the main cause why we have gotten nowhere with TEI in the last 3 years. Lets start now with a version 0.0.1 of the TEI process. Of course at some later time we'll have to do all the posted files over again. Probably more than once. But its better than sitting here and playing with bowerbird because we are bored.
Next, we need a process, using open-source, cross-platform tools -- the standarder the better -- to convert that XML into, at a minimum, plain text and HTML. Other formats are welcome but optional. That process must work for _all_ teixlite files, not just ones that are specially cooked, using constraints not specified within the chosen DTD. Here's where we hit the rocks today.
TEI defines a standard way to extend the DTD. I used this standard way to extend the TEI DTD into what I called PGTEI. This still is a perfectly valid TEI DTD according to the TEI specs.
I don't want to imply specific means from which this process is to be constructed. Obviously XSLT is one possible approach, but I certainly do not want to imply limitations on what that process should use. The only things we must have -- both for our own internal practical purposes and for the use of future readers -- is that it should work reliably on _all_ texts that conform to the XML DTD chosen, be open source, and be cross-platform. A reader needs to be able to tweak the transform and re-run on her own desktop.
You misunderstand what a DTD is. It just gives you syntactical correctness. I can cook up a perfectly valid XHTML file which is semantically bogus: <div><h6>1</h6> <div><h5>1.1</h5> <div><h4>1.1.1</h4> ... </div> </div> </div> This is valid HTML (didn't bother to check) but will render not so well. You cannot build a conversion tool that will produce good results on all syntactically valid TEI files, like you cannot build a browser that will make sense out of semantically bogus HTML files. Furthermore TEI is geared towards marking up existent texts, so scholars can study the text without having to get the physical book. It is not so good as a master format for print processing. That's why I had to add some more tags and attributes to my DTD. (Which doesn't make any text that uses my DTD less standard, because TEI is expressly designed to be extensible. But I'm repeating myself.)
And just re-reading that last, when I say "must work reliably on ALL texts" I do not mean to imply that the same XSLT must be used for all texts, though obviously that would be of benefit, if we can manage it.
So why not start posting texts marked up in PGTEI, which will by definition work well in my conversion chain? And at the same time start posting Jeroens texts, which will convert fine in his chain? This way we could both start putting up an automatic online conversion chain. (The guy who did this already in Java has somehow vanished, so I think we have to start over again.) For the start I will act as interim Post-Processor for people wanting to post PGTEI and pass on to you only the perfectly good ones. You'll just have to stick in the etext number where I put 5 asterisks. I claim the .pgtei file extension, Jeroen can claim what extension he sees fit for his files. So we can have bith an alice30.pgtei and an alice30.jtei. -- Marcello Perathoner webmaster@gutenberg.org