Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19)

25 Aug 2005

      Brad Collins wrote:
...
jeroen said:
...
...
 Some have argued (with valid reasons) that the entire idea of TEI
markup is broken, and have proposed systems in which the mark-up is
separated from the text (stream of characters), in such a way that
multiple, parallel systems of  mark-up can exist. Think of a separate
(part of a) file, saying characters 21 to 34 are italics, and so on.
 This may sound odd, but it is the way the old Macintosh wordprocessor
MacWrite worked.
...
About two years ago I was playing around with the same idea.  My
solution was to take a CSS approach to layering.
CSS places an external layer of formating instructions on top of a
text, so why not extend CSS to also be able to add layers of semantic
markup to a text?
This would make it easy to add semantic markup including glosses,
notes, comments (scholia) etc to a text, even it the text is located
on a server somewhere on the Net.
The folks doing the Hypereal Dictionary of Mathematics are creating a
scholia system based on Emacs text properties to add layers of scholia
to texts.
Interesting! I'll not comment directly on Brad's idea, but will talk
about a distantly related idea, which sort of intersects with what
Brad is talking about when we propose tweaking with CSS.

A couple years ago I floated the idea to both OeBF (as part of OEBPS
work) and to the accessibility folk (such as DAISY) that we explore a
better way a document author can assign structural semantics to the
tags in arbitrary XML documents.

A problem the accessibility people have when encountering an arbitrary
XML document (from an unknown vocabulary) is what do the tags mean
from a document structure viewpoint? A text-to-speech converter needs
to unambigiously know this to do an effective job at properly
conveying the content to the listener. An attached visual CSS style
sheet (standards conforming at least) is insufficient to communicate
the exact structures in such arbitrary XML documents.

So I proposed something called a "Rosetta Stone", which would be a
sort of attached document (probably XML) which describes the semantics
of the tags in the content document so the document structure can be
identified by machine processing. The RS may syntactically be based
upon XSLT, but it is not intended to be a markup transformation --
it is solely a way to assign semantics to elements so the user agent
(such as text-to-speech engine) can figure out what to do with them.

Key to the Rosetta Stone is setting up a universal "metavocabulary"
to describe common document structures. Now, I have no illusion this
will be easy -- it will not be easy -- it will be damn hard to do
right. Then there's the issue of the granularity of the metavocabulary
-- how fine with document structure does one go -- and what types of
documents will be targeted?

By and large CSS was not designed for the purpose of assigning
structural semantics to tags. CSS does have the 'display' property
which assigns, at a very rudimentary level, some critical structural
semantics (block, inline, table, list). But as we know, the allowed
'display' values are quite limited -- they don't, and in practical
sense cannot, assign some critical semantics such as hypertext links,
embedded images and objects (XLink is the vocabulary-agnostic solution
for these particular things.) There is no CSS 'display' property for
section headers, for example (in CSS, a header has to be treated as
simply a kind of "block-level" tag), yet it is clear for
text-to-speech that section headers be specifically identified as
such, and not lumped in with paragraphs.

Then there's the issue that CSS is intended for *styling* during
presentation (by and large visual styling). That is its purpose --
it's not designed to be a "Rosetta Stone" for conveying detailed
structural information.

I don't know if the "Rosetta Stone" idea is tractable, and will in the
long-run solve any real problems. In lieu of that, the accessibility
community, and I think anyone else using markup to structure texts,
would want all XML documents representing publications to conform with
particular, well-defined vocabularies which are marked up in an
acceptable structural, presentational agnostic manner. Properly done
TEI is one such acceptable vocabulary, the more I study it. The
accessibility folk have proposed their own, Digital Talking Book,
which is essentially XHTML with some interesting TEI-like extensions.

(Just about any markup vocabulary can be abused/misused to make it
more difficult to convey the structural/semantic meaning of the
content. Even TEI -- this is why I'm interested in subsetting and
constraining the TEI vocabulary to assure the marked up content will
be more accessible which includes presentation agnosticism.)

Jon

Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19)

Jon Noring