
Bowerbird wrote:
do you notice how the key points are studiously ignored?
i've posted a test-suite here before that contains the structural elements that comprise virtually the entirety of the library of project gutenberg e-texts, and all can be represented in plain text...
and -- in spite of my constant challenge -- no one has been able to come up with any additional elements that should be included.
First of all, why this obsession with "challenges"? You treat discussion like a war rather than a cordial and reasoned debate leading up to common understanding (even if not total agreement.) If people decide not to answer your challenges, it's not necessarily because they can't meet your challenges, but because they won't be bothered. You don't "win" anything when people refuse to play *your* game. It's like a kid going to the playground with bat and ball, but no one else joined his game. Did the kid win the game? Should he brag about winning the game? Anyway, your question, though not previously phrased in a way which would elicit response, is a good one. Here's *some* items that should be considered to be marked up in the master text document (and some if not all of the derivative formats) since they are useful for all kinds of purposes, including, among others, directed styling and accessibility (text-to-speech.) verse (poetry) and verse lines epigraphs (incl. those at the start of a section) likewise, a few other common structures: colophon, index, glossary, dedication, foreword, notes, preface, etc. (or some of these) Quotes and blockquotes, possibly including citation info. original page breaks with page numbers (if the digital text is derived from a paper book. Where the linebreaks occur in the original source could also be indicated. semantics of emphasized text marking up foreign phrases including language code Identification of the roles of contributors to the work (author, illustrator, translator, etc.) Some would say identifying front, body and back matter are important (I think so.) It's a good idea to add a unique identifier to every substantive block-level and some in-level stuff, allowing future linking and directed annotations. Some optional things that could also be very useful (but requires more markup work): Markup all quotations and include the name/gender of the speaker. (Why? TTS can allocate different voices to each speaker.) Since bare HTML does not provide sufficient markup constructs to identify these and other structures, the use of the 'class' attribute will add the sufficient structural meanings, as well as using the <div> tag to wrap up various structures (e.g., <div class="epigraph"). TEI naturally carries the markup needed for almost all document structures known to man. An important application of digital texts is text-to-speech (TTS). DAISY-NISO's Digital Talking Book standard is a good reference to consult when it comes to *minimum* markup with accessibility in mind. Even though DTB mostly concerns itself with audio (person-spoken) books, where DTB is intended to "align" marked up text with the audio stream, DTB is also useful for TTS applications. For the DTB spec and associated DTD (which gives the supported tags), refer to: http://www.niso.org/standards/resources/Z39-86-2005.html http://www.daisy.org/z3986/2005/dtbook-2005-1.dtd The supported tag set does go beyond HTML, and includes some TEI-like tags defining more detailed structures. These are deemed by the accessibility folk important enough to use in document markup. So, can ZML define the structures and text semantics presented in DTB? If it can't, then ZML should not be used as the master format for digital texts for this purpose alone. If you believe that meeting even minimum accessibility needs is not important, then let us know. I vaguely recall you saying (a long time ago) something to the effect that accessibility is not important when it "gets in the way." It is my hope that PG and DP, in all its future work, will always consider accessibility important, and will consult with accessibility experts as needed for important decision-making. Jon (p.s., now thinking about Michael Hart's wish for autotranslation of PG texts. This is NOT a trivial matter, especially considering how one translates books using slang and unusual dialects of the primary language (e.g., "Tom Sawyer".) And many words change meaning over time, or new meanings are added to them (e.g., "gay", like the "Gay Caballero" -- was he homosexual?) Certain higher-level markup *might* assist with autotranslation. It is especially useful to make sure the language and country code is machine encoded (e.g., xml:lang), as well as time/location coding (e.g., using Dublin Core metadata). Anyone here think of other markup that may be useful to assist with improving autotranslation?)