Re: [gutvol-d] PGTEI and more

langUsage: I suggest the standard should be to omit the content of the tag (e.g. "British", which is probably more useful as "British English" or "English (British)"). This information should be generated to ensure consistency. (They appear in the generated PGTEI and in alice.tei, but not in lmiss.tei.)
You have to include only the languages you actually use in the text.
What about the content of the tag? i.e. which is correct? <language id="en-gb"></language> # lmiss.tei <language id="en-gb">British</language> # alice.tei I think the first is much better. Given the second, it will be extra work to enforce a consistent word or phrase.
The converter includes some more because it is easier to delete than to add and if you declare too many it doesn't hurt.
I agree that it's easier to delete; hence my suggestion to include a note. Actually, all languages except the main one should be able to be determined programmatically, right? Just extract and dedup lang= attributes. We certainly don't want to include languages that aren't used; no point in bothering with all this XML if we're just going to populate it with wrong data.
Having separate index tags for TOC, PDF and PDB strikes me as unnecessary and prone to error. Shouldn't the TOC one suffice for all?
Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you don't always want the full <head> to appear in the contents.
So, the PalmDoc and PDF headers can be generated to conform to those limitations. I don't see the benefit of including these extra tags for every chapter of every document in the PG collection! -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting

Scott Lawton wrote:
What about the content of the tag? i.e. which is correct?
<language id="en-gb"></language> # lmiss.tei <language id="en-gb">British</language> # alice.tei
Both work. The contents of the tag does not matter. The lang attribute is and IDREF. If you say <foreign lang="fr"> then you must have an element somewhere in your TEI with and id of "fr" otherwise it will not validate. The <langUsage> section is just a bin to hold those elements.
Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you don't always want the full <head> to appear in the contents.
So, the PalmDoc and PDF headers can be generated to conform to those limitations. I don't see the benefit of including these extra tags for every chapter of every document in the PG collection!
How do you go about to condense a longer title into 16 characters? There is no algorithm that can do that nearly as well as a human. A human will always choose to include the most important part. CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. => Birth of Merlin -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Marcello Perathoner
-
Scott Lawton