Re: [gutvol-d] the end of the line

24 Jun 2006

      Jon Noring grudgingly admits:
...
<or:page/>                   (page break in a paper source)
<or:lb/>                     (line break in a paper source)
<or:marker/>                 (a generic marker)
Why not use <tei:pb> , <tei:lb> and <tei:milestone> ? Insisting on
making your own when there are perfectly good elements in TEI is just
plain ... sub-optimal.
...
he began to crow delight<or:lb/>edly,
Sorry to rain on your parade but your (at best) half-baked proposal has
following shortcomings:

1. Non-standard use of 

The soft-hyphen is a "non-printable" character that may be replaced with
a "printable" hyphen by processors before output.

Your use is to record the place where an existent hyphen has been stripped.

You got it backwards. You confuse the very different stages of text
feature recording and text output.

2. Throws off grep

An xml-grep could find "delight<tei:lb/>edly" if searching for
"delighted", but it surely won't find "delight<tei:lb/>edly".

3. Redundant text feature documentation

All you are doing here is repeatedly "documenting" that the character
used to hyphenate words in this text is the hyphen. You don't have to
repeat that statement through all of your text. A single statement to
that effect in the TEI header will suffice.

4. Incompatibility with LOTE

Remember that in LOTE you have to deal with cases like the German "ck"
and "fff" which got hyphenated this way:

  dachdecker
  dachdek-ker

  Schiffahrt
  Schiff-fahrt

Also remember French and Italian elisions that don't happen at line breaks.

5. Dependance on one edition

All those hard-coded 's will marry your electronic text to one
edition. You have no provision to encode different editions of the very
same text like hardcover and paperback (which may very well have
different line endings).

Conclusion

My advice is: forget entirely about line breaks. They are random
artefacts introduced by the person operating the typesetting machine and
indirectly by the person who chose paper size and font. They have no
raison d'être once you separate the ebook from the scans, ie. after it
left DP. (That this suggestion was by "You Know Who" should have tipped
you off immediately.)

But if you belong to that fastidious class of people who can't throw
away even the most useless random artefact, I suggest doing it this
standard way:

  <html:p>
  ...
  he began to crow de<tei:lb ed="paperback" />light<tei:lb
ed="hardcover" />edly,
  ...
  </html:p>

A standard XHTML browser (OpenReader ?) will simply throw away the
unknown tags and render the normalized text. A special processor may be
used to reconstruct the paper layout of the text.

-- 
Marcello Perathoner
webmaster@gutenberg.org