
--- Marcello Perathoner <marcello@perathoner.de> wrote:
A simple XSLT will convert your format into TEI.
I'm not sure any use of XSLT can be called simple :). I've tried reading the spec, and I'm still recovering from the headaches. Fortunately there are easier ways to style (rather than transform) XML, using CSS. This is very well supported in all the Mozilla derivatives. While XSLT is something I'm going to have to look at eventually, for the moment I'm happy with CSS :). If you want an example of what I'm playing with at the moment: recently I and another DP volunteer have been kicking around some ideas for semantic markup of drama. While initially we were working with straight HTML, this quickly gets annoying, due to the amount of messing around with divs involved, and the need to consider how the output will be displayed on browsers with poor support for CSS. I've found it much easier to investigate options by working with an 'HTML+extra tags' markup. You can see my current working by looking at the blah.* files here: http://www.pgdp.net/phpBB2/viewtopic.php?p=94734 Save each file to the name given in its post subject heading. Any Mozilla derivative should show the .xml file styled in a way which almost exactly replicates the .html file. The source for the XML edition is much easier to read. Those of you who know TEI can probably tell that 'my' markup is very similar to TEI markup (although a little more verbose). Much of it was arrived at independently, which makes me more confident that this styling approach is relatively sensible. The example demonstrates markup of drama and poetry, with decent handling of line continuations and line numbers in poetry, and stage directions in drama. I've used the HTML 'edition' of this poetry markup for quite a while now in texts I've PPed for PG. Note that this is still a work in progress, so resist the tempation to criticise the minutiae of my CSS :). One of the other reasons I think a simple XML-style is useful is that we're currently planning to seperate the proofreading rounds from the markup rounds at DP. Every page of a DP project currently goes through two 'rounds' of processing. In each round proofers are expected to not only detect OCR errors, but add inline markup for italic, bold, material in non-Latin alphabets, etc., and add block markup for poetry, tables, and so on. This will be split into an initial two rounds only concerned with the text, plus an extra procedure to mark the text correctly. At the moment the markup we use is homegrown and kludgy -- we have a great opportunity at the moment to move to something more sensible, and I strongly believe that some simple XML-derivative is the markup we need. I'm even more convinced of the utility of XML for DP now that I've seen how easy it is to style it. One of the problems of relying on something like XSLT is that it can be hard to go backwards from errors in the output to find the corresponding error in the original XML input. Being able to get direct feedback by viewing a styled version of the XML makes life much easier. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com