
I didn't notice this discussion was heading to my favourite subject... TEI. I guess enlightened is on my mental spam filter... Jon Noring wrote:
For maximum archivability, repurposeability and accessibility, it is important for the XML markup vocabulary used in the master document to be wholly structural and semantic. Except where absolutely necessary (and maybe best solved using SVG and MathML), presentational markup should be avoided.
Since we are reproducing printed works, it is often not possible to reconstruct the intended semantics of the user. This is especially true of books before the mid 19th century, when typographic conventions where not as well established. For many older books the best we can do is capture the typography in some "reduced" way. The good thing about TEI is that it actually supports that.
TEI is primarily structural/semantic, but there are some presentational components. The base DP-TEI (I envision three levels of DP-TEI), when it comes into being, should not specify any presentational markup components.
I am not familiar with OpenOffice's XML vocabulary, but I would guess that it, too, is a mix of structural/semantic tags with presentation tags (I also guess that it is much more presentationally-oriented than TEI, and doesn't have the structural/semantic richness of TEI.) If OpenOffice's XML vocabulary is to be used, it should be subsetted (at least at the base level) to not allow presentational markup.
OpenOffice XML has a lot of features geared towards an office application and the nasty details of presentation. It is quite presentational, and I wouldn't recommend it as a long term archive format. However, it is much better structured than Microsoft .DOC format, and considerable more compact (using zip as it does).
I do not recommend DocBook as the primary markup vocabulary for general books, but certainly it is intriguing to consider it as a second "blessed" vocabulary for particular types of documents it is designed for (primarily technical documents.)
Reminds me of that old saying about standards, good to have so many to choose from... DocBook is fine for technical manuals written from scratch, not for capturing a nineteenth century novel, or sixteenth century history. Jeroen.