Re: [gutvol-d] key points studiously ignored

24 Sep 2005

      Bowerbird wrote:
...
do you notice how the key points are studiously ignored?
i've posted a test-suite here before that contains the
structural elements that comprise virtually the entirety
of the library of project gutenberg e-texts, and all can
be represented in plain text...
and -- in spite of my constant challenge -- no one has
been able to come up with any additional elements that
should be included.
First of all, why this obsession with "challenges"? You treat
discussion like a war rather than a cordial and reasoned debate
leading up to common understanding (even if not total agreement.)

If people decide not to answer your challenges, it's not necessarily
because they can't meet your challenges, but because they won't be
bothered. You don't "win" anything when people refuse to play *your*
game. It's like a kid going to the playground with bat and ball, but
no one else joined his game. Did the kid win the game? Should he brag
about winning the game?

Anyway, your question, though not previously phrased in a way which
would elicit response, is a good one.

Here's *some* items that should be considered to be marked up in the
master text document (and some if not all of the derivative formats)
since they are useful for all kinds of purposes, including, among
others, directed styling and accessibility (text-to-speech.)

   verse (poetry) and verse lines
   epigraphs (incl. those at the start of a section)
   likewise, a few other common structures: colophon, index, glossary,
      dedication, foreword, notes, preface, etc. (or some of these)
   Quotes and blockquotes, possibly including citation info.
   original page breaks with page numbers (if the digital text is
      derived from a paper book. Where the linebreaks occur in the
      original source could also be indicated.
   semantics of emphasized text
   marking up foreign phrases including language code
   Identification of the roles of contributors to the work (author,
      illustrator, translator, etc.)
   Some would say identifying front, body and back matter are
      important (I think so.)
   It's a good idea to add a unique identifier to every substantive
      block-level and some in-level stuff, allowing future linking and
      directed annotations.

Some optional things that could also be very useful (but requires
more markup work):

   Markup all quotations and include the name/gender of the speaker.
   (Why? TTS can allocate different voices to each speaker.)

Since bare HTML does not provide sufficient markup constructs to
identify these and other structures, the use of the 'class' attribute
will add the sufficient structural meanings, as well as using the
<div> tag to wrap up various structures (e.g., <div class="epigraph").
TEI naturally carries the markup needed for almost all document
structures known to man.

An important application of digital texts is text-to-speech (TTS).
DAISY-NISO's Digital Talking Book standard is a good reference to
consult when it comes to *minimum* markup with accessibility in mind.
Even though DTB mostly concerns itself with audio (person-spoken)
books, where DTB is intended to "align" marked up text with the
audio stream, DTB is also useful for TTS applications. For the DTB
spec and associated DTD (which gives the supported tags), refer to:

   http://www.niso.org/standards/resources/Z39-86-2005.html
   http://www.daisy.org/z3986/2005/dtbook-2005-1.dtd

The supported tag set does go beyond HTML, and includes some TEI-like
tags defining more detailed structures. These are deemed by the
accessibility folk important enough to use in document markup.

So, can ZML define the structures and text semantics presented in DTB?
If it can't, then ZML should not be used as the master format for
digital texts for this purpose alone. If you believe that meeting even
minimum accessibility needs is not important, then let us know. I
vaguely recall you saying (a long time ago) something to the effect
that accessibility is not important when it "gets in the way."

It is my hope that PG and DP, in all its future work, will always
consider accessibility important, and will consult with accessibility
experts as needed for important decision-making.

Jon

(p.s., now thinking about Michael Hart's wish for autotranslation of
PG texts. This is NOT a trivial matter, especially considering how one
translates books using slang and unusual dialects of the primary
language (e.g., "Tom Sawyer".) And many words change meaning over time,
or new meanings are added to them (e.g., "gay", like the "Gay
Caballero" -- was he homosexual?) Certain higher-level markup *might*
assist with autotranslation. It is especially useful to make sure the
language and country code is machine encoded (e.g., xml:lang), as well
as time/location coding (e.g., using Dublin Core metadata). Anyone
here think of other markup that may be useful to assist with improving
autotranslation?)

Re: [gutvol-d] key points studiously ignored

Jon Noring