[gutvol-d] Posting TEI

20 Oct 2004

      Jim Tinsley wrote:
...
Nobody has an objection to valid TEI texts, but valid TEI texts alone
_are not enough_. An XML file that cannot be read (by an actual human)
is as useful as a lock with no key.
Not so. Having a TEI text posted would enable third-party developers to 
come up with their own converter solutions eve if we didn't get very far 
with ours. There are a lot of people around who already convert the text 
files into other formats. Their jobs would get much easier.
...
I really no longer give any headroom at all to the approach "Post XML
Now Because That Is The One True Way And We'll Figure Out How To Read
It Later." If for no other reason, then because the most important
part of the WW job is to check the texts before posting, and if we
can't read it, we can't find the errors, and if we can't find the
errors, we can't fix 'em.
A TEI text is basically a text file. So you can read it in any editor. 
If you use emacs you can also validate the TEI file against the DTD 
without leaving the editor.

A perfectly valid TEI file with no spelling errors should be good enough 
to post.

What you expect from us TEI developers is that we produce the 150% 
perfect solution before you even consider starting to post files. That 
is not the way software development works.

And this attitude is in my opinion the main cause why we have gotten 
nowhere with TEI in the last 3 years.

Lets start now with a version 0.0.1 of the TEI process. Of course at 
some later time we'll have to do all the posted files over again. 
Probably more than once. But its better than sitting here and playing 
with bowerbird because we are bored.
...
Next, we need a process, using open-source, cross-platform tools --
the standarder the better -- to convert that XML into, at a minimum,
plain text and HTML. Other formats are welcome but optional. That
process must work for _all_ teixlite files, not just ones that are
specially cooked, using constraints not specified within the chosen
DTD. Here's where we hit the rocks today.
TEI defines a standard way to extend the DTD. I used this standard way 
to extend the TEI DTD into what I called PGTEI. This still is a 
perfectly valid TEI DTD according to the TEI specs.
...
I don't want to imply specific means from which this process is to be
constructed. Obviously XSLT is one possible approach, but I certainly
do not want to imply limitations on what that process should use. The
only things we must have -- both for our own internal practical
purposes and for the use of future readers -- is that it should work
reliably on _all_ texts that conform to the XML DTD chosen, be open
source, and be cross-platform. A reader needs to be able to tweak the
transform and re-run on her own desktop.
You misunderstand what a DTD is. It just gives you syntactical 
correctness. I can cook up a perfectly valid XHTML file which is 
semantically bogus:

   <div><h6>1</h6>
     <div><h5>1.1</h5>
        <div><h4>1.1.1</h4>
          ...
        </div>
     </div>
   </div>

This is valid HTML (didn't bother to check) but will render not so well.

You cannot build a conversion tool that will produce good results on all 
syntactically valid TEI files, like you cannot build a browser that will 
make sense out of semantically bogus HTML files.

Furthermore TEI is geared towards marking up existent texts, so scholars 
can study the text without having to get the physical book. It is not so 
good as a master format for print processing. That's why I had to add 
some more tags and attributes to my DTD. (Which doesn't make any text 
that uses my DTD less standard, because TEI is expressly designed to be 
extensible. But I'm repeating myself.)
...
And just re-reading that last, when I say "must work reliably on ALL
texts" I do not mean to imply that the same XSLT must be used for all
texts, though obviously that would be of benefit, if we can manage it.
So why not start posting texts marked up in PGTEI, which will by 
definition work well in my conversion chain?

And at the same time start posting Jeroens texts, which will convert 
fine in his chain?

This way we could both start putting up an automatic online conversion 
chain. (The guy who did this already in Java has somehow vanished, so I 
think we have to start over again.)

For the start I will act as interim Post-Processor for people wanting to 
post PGTEI and pass on to you only the perfectly good ones. You'll just 
have to stick in the etext number where I put 5 asterisks.

I claim the .pgtei file extension, Jeroen can claim what extension he 
sees fit for his files. So we can have bith an alice30.pgtei and an 
alice30.jtei.

-- 
Marcello Perathoner
webmaster@gutenberg.org