XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis)

20 Oct 2004

      --- Marcello Perathoner <marcello@perathoner.de> wrote:
...
A simple XSLT will convert your format into TEI.
I'm not sure any use of XSLT can be called simple :). I've tried reading the
spec, and I'm still recovering from the headaches. Fortunately there are easier
ways to style (rather than transform) XML, using CSS. This is very well
supported in all the Mozilla derivatives. While XSLT is something I'm going to
have to look at eventually, for the moment I'm happy with CSS :).

If you want an example of what I'm playing with at the moment: recently I and
another DP volunteer have been kicking around some ideas for semantic markup of
drama. While initially we were working with straight HTML, this quickly gets
annoying, due to the amount of messing around with divs involved, and the need
to consider how the output will be displayed on browsers with poor support for
CSS. I've found it much easier to investigate options by working with an
'HTML+extra tags' markup.

You can see my current working by looking at the blah.* files here:

  http://www.pgdp.net/phpBB2/viewtopic.php?p=94734

Save each file to the name given in its post subject heading. Any Mozilla
derivative should show the .xml file styled in a way which almost exactly
replicates the .html file. The source for the XML edition is much easier to
read.

Those of you who know TEI can probably tell that 'my' markup is very similar to
TEI markup (although a little more verbose). Much of it was arrived at
independently, which makes me more confident that this styling approach is
relatively sensible. The example demonstrates markup of drama and poetry, with
decent handling of line continuations and line numbers in poetry, and stage
directions in drama. I've used the HTML 'edition' of this poetry markup for
quite a while now in texts I've PPed for PG.

Note that this is still a work in progress, so resist the tempation to
criticise the minutiae of my CSS :).

One of the other reasons I think a simple XML-style is useful is that we're
currently planning to seperate the proofreading rounds from the markup rounds
at DP. Every page of a DP project currently goes through two 'rounds' of
processing. In each round proofers are expected to not only detect OCR errors,
but add inline markup for italic, bold, material in non-Latin alphabets, etc.,
and add block markup for poetry, tables, and so on. This will be split into an
initial two rounds only concerned with the text, plus an extra procedure to
mark the text correctly. At the moment the markup we use is homegrown and
kludgy -- we have a great opportunity at the moment to move to something more
sensible, and I strongly believe that some simple XML-derivative is the markup
we need. 

I'm even more convinced of the utility of XML for DP now that I've seen how
easy it is to style it. One of the problems of relying on something like XSLT
is that it can be hard to go backwards from errors in the output to find the
corresponding error in the original XML input. Being able to get direct
feedback by viewing a styled version of the XML makes life much easier.

-- 
Jon Ingram

_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis)

Jonathan Ingram