
All of this is very good stuff. But I hope you don't mind if most of it is pushed back to the second iteration of PGTEI. My personal thoughts are to get a "standard" in place that handles what DP would normally label Easy through Normal difficult, in Latin-1 compatible texts. Then, once we have that in place, move on the stuff that gives DP fits on a regular basis, like fraktur, long-s, non-Latin-1 texts (granted DP-Europe handles most of those now). I definitely want to see the issues you bring up addressed. I'm just trying to set some realistic boundaries on what we can address on an incremental basis. Josh ----- Original Message ----- From: "D. Starner" <shalesller@writeme.com> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Subject: Re: [gutvol-d] draft TEI conventions and larger example file Date: Thu, 28 Oct 2004 15:06:39 -0800
I have a few comments on the draft guidelines. It'd be nice to have page numbers printed on the pages. A letter size PDF would be useful, but the margins on this one seem generous enough to print on letter-sized paper.
DP probably will not be preserving the long-s, and I think it a little unrealistic to expect most of PG's XML documents to preserve it. Also, the description is incorrect; in English, it's used everywhere except at the end of the word, and it was used until about 1800, making it used in the 18th century.
It's always used in Fraktur; are we going to preserve that? Counting that, it was used until the middle of the 20th century. It's probably too minor for this document, but several German documents I've seen use a non-ligatured long-s/s combination for the eszett, while not using the long-s elsewhere. Even at the most pedantic, it's arguable whether this should be encoded with the long-s.
There should be an option to preserve running headers where they encode information not found elsewhere.
I think we should go with standards on the languages section; that is, RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish (how does this differ from sco?) in the draft, and I doubt anything would choke on it today. #6 is a bad idea, especially as 3 letter 639 codes sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad or x-gaddang is a better idea.
What happened to emph? All I see is rend. Likewise, I'd rather see foreign do italics and let you mark it with rend="none" if needed, as that would match how most books do it, and give a guideline to when to use foreign.
I partially marked up Japanese Literature, and eventually decided not to mark up all the non-italics Japanese words used in running English text, like names of plants and such. I think a comment to mark up running foreign text and italized foreign words, but avoid single words, like the names of plants and foods, in running text if not italized. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d