
Brad Collins wrote:
jeroen said:
Some have argued (with valid reasons) that the entire idea of TEI markup is broken, and have proposed systems in which the mark-up is separated from the text (stream of characters), in such a way that multiple, parallel systems of mark-up can exist. Think of a separate (part of a) file, saying characters 21 to 34 are italics, and so on. This may sound odd, but it is the way the old Macintosh wordprocessor MacWrite worked.
About two years ago I was playing around with the same idea. My solution was to take a CSS approach to layering.
CSS places an external layer of formating instructions on top of a text, so why not extend CSS to also be able to add layers of semantic markup to a text?
This would make it easy to add semantic markup including glosses, notes, comments (scholia) etc to a text, even it the text is located on a server somewhere on the Net.
The folks doing the Hypereal Dictionary of Mathematics are creating a scholia system based on Emacs text properties to add layers of scholia to texts.
Interesting! I'll not comment directly on Brad's idea, but will talk about a distantly related idea, which sort of intersects with what Brad is talking about when we propose tweaking with CSS. A couple years ago I floated the idea to both OeBF (as part of OEBPS work) and to the accessibility folk (such as DAISY) that we explore a better way a document author can assign structural semantics to the tags in arbitrary XML documents. A problem the accessibility people have when encountering an arbitrary XML document (from an unknown vocabulary) is what do the tags mean from a document structure viewpoint? A text-to-speech converter needs to unambigiously know this to do an effective job at properly conveying the content to the listener. An attached visual CSS style sheet (standards conforming at least) is insufficient to communicate the exact structures in such arbitrary XML documents. So I proposed something called a "Rosetta Stone", which would be a sort of attached document (probably XML) which describes the semantics of the tags in the content document so the document structure can be identified by machine processing. The RS may syntactically be based upon XSLT, but it is not intended to be a markup transformation -- it is solely a way to assign semantics to elements so the user agent (such as text-to-speech engine) can figure out what to do with them. Key to the Rosetta Stone is setting up a universal "metavocabulary" to describe common document structures. Now, I have no illusion this will be easy -- it will not be easy -- it will be damn hard to do right. Then there's the issue of the granularity of the metavocabulary -- how fine with document structure does one go -- and what types of documents will be targeted? By and large CSS was not designed for the purpose of assigning structural semantics to tags. CSS does have the 'display' property which assigns, at a very rudimentary level, some critical structural semantics (block, inline, table, list). But as we know, the allowed 'display' values are quite limited -- they don't, and in practical sense cannot, assign some critical semantics such as hypertext links, embedded images and objects (XLink is the vocabulary-agnostic solution for these particular things.) There is no CSS 'display' property for section headers, for example (in CSS, a header has to be treated as simply a kind of "block-level" tag), yet it is clear for text-to-speech that section headers be specifically identified as such, and not lumped in with paragraphs. Then there's the issue that CSS is intended for *styling* during presentation (by and large visual styling). That is its purpose -- it's not designed to be a "Rosetta Stone" for conveying detailed structural information. I don't know if the "Rosetta Stone" idea is tractable, and will in the long-run solve any real problems. In lieu of that, the accessibility community, and I think anyone else using markup to structure texts, would want all XML documents representing publications to conform with particular, well-defined vocabularies which are marked up in an acceptable structural, presentational agnostic manner. Properly done TEI is one such acceptable vocabulary, the more I study it. The accessibility folk have proposed their own, Digital Talking Book, which is essentially XHTML with some interesting TEI-like extensions. (Just about any markup vocabulary can be abused/misused to make it more difficult to convey the structural/semantic meaning of the content. Even TEI -- this is why I'm interested in subsetting and constraining the TEI vocabulary to assure the marked up content will be more accessible which includes presentation agnosticism.) Jon