
On Wed, 20 Oct 2004 17:25:29 +0200, Marcello Perathoner <marcello@perathoner.de> wrote:
I don't think we'll get PG to post texts in non-standard cooked-up formats. They are already making enough fuzz over perfectly valid TEI files.
That last is, if not inaccurate, at least misleading. And I think you mean, by "PG" and "they" above, the WWs. So let's get down to it. Nobody has an objection to valid TEI texts, but valid TEI texts alone _are not enough_. An XML file that cannot be read (by an actual human) is as useful as a lock with no key. We need the key as well as the lock. I really no longer give any headroom at all to the approach "Post XML Now Because That Is The One True Way And We'll Figure Out How To Read It Later." If for no other reason, then because the most important part of the WW job is to check the texts before posting, and if we can't read it, we can't find the errors, and if we can't find the errors, we can't fix 'em. We WWs would all LOVE to have only one format (XML) uploaded, and generate all posting files from that. It would cut out an amazing amount of work and uncertainty. Further dowwn the line, we can get to looking at posting just the XML, and generate other formats on the fly, but let's take one step at a time. Considering that this step to date has already taken three years or so, that's not overly cautious! The first thing we need to do is get substantial agreement on a flavor of XML -- not ruling out the addition of future flavors, you understand, but we need to get at least one of them bedded down before we attack others. Teixlite seems to be the majority choice among those relatively few volunteers who are enthusiastic about XML, so let's say, for the purpose of this discussion, that that's the one we're working on. Next, we need a process for adding the header and footer for PG texts for the selected flavor. That shouldn't be a problem; if we can agree how to tag them, we can automate that. (We don't actually _have_ agreement about tagging them, but I can't believe that could end up being a problem, once we settle on the rest.) Next, we need a process, using open-source, cross-platform tools -- the standarder the better -- to convert that XML into, at a minimum, plain text and HTML. Other formats are welcome but optional. That process must work for _all_ teixlite files, not just ones that are specially cooked, using constraints not specified within the chosen DTD. Here's where we hit the rocks today. I give considerable credit to you, Marcello, and to Jeroen, as the only people I know of who have come up with at least partial answers and approaches to this. Maybe you have refined your processes, but the last time I tried, I couldn't put Jeroen's files through your process, and get the expected results. I think you have most of it down, though. Is it close enough to try again? I don't want to imply specific means from which this process is to be constructed. Obviously XSLT is one possible approach, but I certainly do not want to imply limitations on what that process should use. The only things we must have -- both for our own internal practical purposes and for the use of future readers -- is that it should work reliably on _all_ texts that conform to the XML DTD chosen, be open source, and be cross-platform. A reader needs to be able to tweak the transform and re-run on her own desktop. And just re-reading that last, when I say "must work reliably on ALL texts" I do not mean to imply that the same XSLT must be used for all texts, though obviously that would be of benefit, if we can manage it. I've held just about every position on XML at one time or another, and I'm all XMLed out. I no longer believe it is worth spending my time on, until somebody (else!) solves the issues I've just laid out. jim