Re: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis)

20 Oct 2004

      On Wed, 20 Oct 2004 17:25:29 +0200, Marcello Perathoner <marcello@perathoner.de> wrote:
...
I don't think we'll get PG to post texts in non-standard cooked-up 
formats. They are already making enough fuzz over perfectly valid TEI files.
That last is, if not inaccurate, at least misleading.

And I think you mean, by "PG" and "they" above, the WWs. So let's get
down to it.

Nobody has an objection to valid TEI texts, but valid TEI texts alone
_are not enough_. An XML file that cannot be read (by an actual human)
is as useful as a lock with no key.

We need the key as well as the lock.

I really no longer give any headroom at all to the approach "Post XML
Now Because That Is The One True Way And We'll Figure Out How To Read
It Later." If for no other reason, then because the most important
part of the WW job is to check the texts before posting, and if we
can't read it, we can't find the errors, and if we can't find the
errors, we can't fix 'em.

We WWs would all LOVE to have only one format (XML) uploaded, and
generate all posting files from that. It would cut out an amazing
amount of work and uncertainty. Further dowwn the line, we can get to
looking at posting just the XML, and generate other formats on the
fly, but let's take one step at a time. Considering that this step to
date has already taken three years or so, that's not overly cautious!

The first thing we need to do is get substantial agreement on a flavor
of XML -- not ruling out the addition of future flavors, you
understand, but we need to get at least one of them bedded down before
we attack others. Teixlite seems to be the majority choice among those
relatively few volunteers who are enthusiastic about XML, so let's
say, for the purpose of this discussion, that that's the one we're
working on.

Next, we need a process for adding the header and footer for PG texts
for the selected flavor. That shouldn't be a problem; if we can agree
how to tag them, we can automate that. (We don't actually _have_
agreement about tagging them, but I can't believe that could end up
being a problem, once we settle on the rest.)

Next, we need a process, using open-source, cross-platform tools --
the standarder the better -- to convert that XML into, at a minimum,
plain text and HTML. Other formats are welcome but optional. That
process must work for _all_ teixlite files, not just ones that are
specially cooked, using constraints not specified within the chosen
DTD. Here's where we hit the rocks today. 

I give considerable credit to you, Marcello, and to Jeroen, as the
only people I know of who have come up with at least partial answers
and approaches to this. Maybe you have refined your processes, but the
last time I tried, I couldn't put Jeroen's files through your process,
and get the expected results. I think you have most of it down,
though. Is it close enough to try again?

I don't want to imply specific means from which this process is to be
constructed. Obviously XSLT is one possible approach, but I certainly
do not want to imply limitations on what that process should use. The
only things we must have -- both for our own internal practical
purposes and for the use of future readers -- is that it should work
reliably on _all_ texts that conform to the XML DTD chosen, be open
source, and be cross-platform. A reader needs to be able to tweak the
transform and re-run on her own desktop. 

And just re-reading that last, when I say "must work reliably on ALL
texts" I do not mean to imply that the same XSLT must be used for all
texts, though obviously that would be of benefit, if we can manage it.

I've held just about every position on XML at one time or another,
and I'm all XMLed out. I no longer believe it is worth spending my
time on, until somebody (else!) solves the issues I've just laid out.

jim

Re: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis)

Jim Tinsley