Re: [gutvol-d] PGTEI and more

30 Oct 2004

      Carlo Traverso wrote:
...
Marcello> Steve Thomas wrote:
>> The common advice seems to be to use <q> to enclose quoted
    >> speech *inline*, and use <quote> for quoting larger blocks of
    >> text. The P4 TEI manual was a bit vague on this, but that seems
    >> to be a sensible convention worth using.
Marcello> That would be presentational markup and very against the
    Marcello> TEI specs. The specs are very detailed on this:
If TEI has to be used only semantically, then it is inadequate for PG
needs. PG markup has to contain presentational elements, in such a way
that one can obtain presentations "faithful to the original".
I didn't say that. I said that using <q> and <quote> to markup inline 
and block quotes respectively was wrong.

In TEI all of the presentational stuff should be done with the rend 
attribute.

   <q rend="display">

As to the "faithful to the original" debate:

Most people are far too much enamoured of exactly replicating the one 
edition of the text they happen to work on. (I can understand people 
wanting to faithfully replicate a Shakespeare First Folio, but not the 
books PG usually produces.)

Most of the presentational attributes of any edition of a text are just 
whims of the publisher. Who cares if the authors name was printed in 
Zapf Chancery Slanted 17,4 pt gold embossed with 0.1em of extra 
inter-character spacing added? If you get a different edition of the 
same work the authors name will be printed in a very different font.

The best guess is to just encode that this is the authors name.
...
One should never forget that presentation IS semantic: this is evident
with heavily formatted poetry, (Mallarme's "Un coup de des jamais
n'abolira le hazard" is a quite extreme case) but in some form or
another it is always true.
That is a half truth at the best.

Presentation encodes semantics, but it is a lossy encoding.

The same presentational attribute "italics" can encode a wide range of 
semantic features like "emphasis", "foreign word", "name", etc.

If presentation could losslessly encode semantics, and an accepted 
standard existed how to do this, a program could recover the semantics 
from the presentation and mark up a text all by itself. But then, if a 
program can guess, why mark up at all?

This is Bowerbirds ZML approach. What Bowerbird does not understand is 
that there are far too many semantic features to make a presentational 
encoding reversible. (Technically Bowerbird is farther off the rocker 
still: he says that ASCII TXT can encode all semantics in the world, 
which is even sillier than to say that typography can.)

Mathematically speaking:

   Let PRE be the set of all presentational attributes
   that can reasonably be distinguished by human eye,
   and SEM be the set of all semantics.

   Then there is no bijective function PRE = f (SEM)

Thus we can say "presentation hints at semantics" but not "presentation 
IS semantic".

-- 
Marcello Perathoner
webmaster@gutenberg.org