
Attached at the bottom is rough draft of a teiHeader spec. I basically wrote it as an example teiHeader with comments scattered all over to explain things. Since I'm not a cataloging expert (heck, I'm don't even qualify as a catalog neophyte), I'm relying heavily on what is in Marcello's documentation and the original TEI documentation. I basically just picked out stuff that looked important and relevant based on stuff I've done before. I'm sure I've missed some things. Please take a look through this and point out items that aren't covered and you think should be. Also, if anything is unclear, let me know. I'll try to explain it better (and update the explanation in the spec). Remember, the goal is to be able to grab information from the teiHeader in each etext to generate cataloging information (this ties in nicely with the ongoing MARC discussions around here). Josh PS The next document will be the actual markup spec rough draft. This one will probably be delayed until after the US Thanksgiving holidays (I don't think well with 5 pounds of turkey digesting in my tummy!)

Joshua Hutchinson wrote:
<title>The Title of the EText</title> <-- MANDATORY SECTION -->
We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it. <title nonfiling="4">The Tempest</title> <title nonfiling="2">A Midsummer Nights Dream</title> This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right.
<editionStmt> <-- OPTIONAL --> <edition n="1">First edition</edition> <-- OPTIONAL --> </editionStmt>
I think the edition number is not maintained any more. I don't see any of them in the new file system.
<publicationStmt> <-- MANDATORY SECTION --> <publisher>Project Gutenberg</publisher> <-- MANDATORY SECTION --> <date value="2004-11">November, 2004</date> <-- MANDATORY SECTION --> <idno type="etext-number">12345</idno> <-- MANDATORY SECTION --> </publicationStmt>
The date should also mention the day. We are not using the date for filing any more. Is this the date of first publication or updated with each new edition ?
<textClass> <-- RECOMMENDED SECTION --> <keywords> <list> <item>KEYWORD</item> </list> </keywords> </textClass>
This needs some more thought as the keywords should come out of some authority list. In that case the authority must be specified.
<change> <date value="2004-11">November 2004</date> <respStmt> <name>Scans provided by Cornell University</name> <name>Joshua Hutchinson</name> <name>Juliet Sutherland</name> <name>Distributed Proofreaders</name> </respStmt> <item>Etext created</item> </change>
Better separate scanning and proofing: <change> <date value="2003">2003</date> <respStmt> <name type="Organisation">Cornell University</name> </respStmt> <item>Scanned the source</item> </change> <change> <date value="2004-11">November 2004</date> <respStmt> <name>Joshua Hutchinson</name> <name>Juliet Sutherland</name> <name>Distributed Proofreaders</name> </respStmt> <item>Etext created</item> </change> -- Marcello Perathoner webmaster@gutenberg.org

<title>The Title of the EText</title> <-- MANDATORY SECTION -->
We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it.
<title nonfiling="4">The Tempest</title>
<title nonfiling="2">A Midsummer Nights Dream</title>
This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right.
I think that should be done (or not) by the cataloging software rather than hardcoded into each file. It's an easy thing to miss, i.e. to be done inconsistently. And, since it's not part of non-PG TEI, there's no other software in the outside world that looks for it. (I may have made this point before, but if so I can't find it in my archives.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting

Another option for the title is to use a file-as attribute. <title file-as="Tempest, The">The Tempest</title> <title file-as="Midsummer Nights Dream, A">A Midsummer Nights Dream</title> While this may not be included in the TEI standard, it is part of the OEB standard, http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Scott Lawton Sent: 19 November, 2004 13:31 To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] TEI Header Spec (rough draft)
<title>The Title of the EText</title> <-- MANDATORY SECTION -->
We should provide an non-standard attribute of "nonfiling". This is the
number of chars to remove from the start of title before sorting it.
<title nonfiling="4">The Tempest</title>
<title nonfiling="2">A Midsummer Nights Dream</title>
This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right.
I think that should be done (or not) by the cataloging software rather than hardcoded into each file. It's an easy thing to miss, i.e. to be done inconsistently. And, since it's not part of non-PG TEI, there's no other software in the outside world that looks for it. (I may have made this point before, but if so I can't find it in my archives.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

"Scott" == Scott Lawton <scott_bulkmail@productarchitect.com> writes:
>>> <title>The Title of the EText</title> <-- MANDATORY SECTION >>> --> >> We should provide an non-standard attribute of >> "nonfiling". This is the number of chars to remove from the >> start of title before sorting it. >> >> <title nonfiling="4">The Tempest</title> >> >> <title nonfiling="2">A Midsummer Nights Dream</title> >> >> This is an extension to TEI but very useful for the catalog >> software. It avoids unsightly titles like: "Tempest, The" and >> still sorts right. Scott> I think that should be done (or not) by the cataloging Scott> software rather than hardcoded into each file. How can the software guess what is filing and what not? "As Farpas" and "As you like it", "As" is filing or not? Here the language might decide, but I think that it is possible in the same language to have the same word to be filing or non filing (surely it is in italian if you disregard accents). However, relying on character count is very fragile, especially in a context in which whitespace is considered irrelevant. I have often seen braces used in sorting software: <title>{The} Tempest</title>, <title>{A} Midsummer Nights Dream</title>: characters in braces and whitespace are discarded for the purpose of sorting, braces are discarded for the purpose of printing. Of course it is possible to achieve the same result, much more verbosely, with angled brackets.... <title><nonfiling>The</nonfiling> Tempest</title> (a side remark: a non-filing part is not always separated by space: <title>{L'}Inferno</title>) Carlo

Marcello Perathoner wrote:
Joshua Hutchinson wrote:
<title>The Title of the EText</title> <-- MANDATORY SECTION -->
We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it.
<title nonfiling="4">The Tempest</title>
<title nonfiling="2">A Midsummer Nights Dream</title>
This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right.
I see the need for this... But I think I like Jeffrey's method a little better. (From another post) <title file-as="Tempest, The">The Tempest</title> <title file-as="Midsummer Nights Dream, A">A Midsummer Nights Dream</title> While this may not be included in the TEI standard, it is part of the OEB standard, http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm
<publicationStmt> <-- MANDATORY SECTION --> <publisher>Project Gutenberg</publisher> <-- MANDATORY SECTION --> <date value="2004-11">November, 2004</date> <-- MANDATORY SECTION --> <idno type="etext-number">12345</idno> <-- MANDATORY SECTION --> </publicationStmt>
The date should also mention the day. We are not using the date for filing any more.
Fair enough. Will change it.
Is this the date of first publication or updated with each new edition ?
First publication. Subsequent updates will be documented at the end.
<textClass> <-- RECOMMENDED SECTION --> <keywords> <list> <item>KEYWORD</item> </list> </keywords> </textClass>
This needs some more thought as the keywords should come out of some authority list. In that case the authority must be specified.
This is where the catalog folks need to step in. :)
<change> <date value="2004-11">November 2004</date> <respStmt> <name>Scans provided by Cornell University</name> <name>Joshua Hutchinson</name> <name>Juliet Sutherland</name> <name>Distributed Proofreaders</name> </respStmt> <item>Etext created</item> </change>
Better separate scanning and proofing:
<change> <date value="2003">2003</date> <respStmt> <name type="Organisation">Cornell University</name> </respStmt> <item>Scanned the source</item> </change> <change> <date value="2004-11">November 2004</date> <respStmt> <name>Joshua Hutchinson</name> <name>Juliet Sutherland</name> <name>Distributed Proofreaders</name> </respStmt> <item>Etext created</item> </change>
Ok, we can separate that information out... Will update. Josh

Joshua Hutchinson wrote:
I see the need for this... But I think I like Jeffrey's method a little better. (From another post)
<title file-as="Tempest, The">The Tempest</title> <title file-as="Midsummer Nights Dream, A">A Midsummer Nights Dream</title>
While this may not be included in the TEI standard, it is part of the OEB standard, http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm
My method is part of the MARC standard and is already implemented in the catalog database. -- Marcello Perathoner webmaster@gutenberg.org
participants (5)
-
Carlo Traverso
-
Jeffrey Kraus-yao
-
Joshua Hutchinson
-
Marcello Perathoner
-
Scott Lawton