Re: [gutvol-d] Indexing Editors, etc.

Greg Newby <gbnewby@pglaf.org> writes:
Sorry about that, David. The reason is that the automatic cataloging program only picks up the metadata in the book header like Author:, Title:, etc.
So is this something that I should take up within DP? What needs to be done that isn't? -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm

On Sat, Sep 18, 2004 at 09:03:21PM -0800, D. Starner wrote:
Greg Newby <gbnewby@pglaf.org> writes:
Sorry about that, David. The reason is that the automatic cataloging program only picks up the metadata in the book header like Author:, Title:, etc.
So is this something that I should take up within DP? What needs to be done that isn't?
Catalog/index entries are created automatically (GUTINDEX.ALL is created by hand, from the Posted messages). So, it's probably best to do what you're doing: check the index the day after posting, and email changes/fixes to catalog@pglaf.org We *can* add fields like "Translator: " and "Illustrator: " to the eBook metadata, which are picked up by the automatic catalog creator. But for complex author lists we need to tweak the catalog entry by hand. I am not sure we can automate things much more than they are already, but if you see areas for improvement either with the DP process or the WW process, speak up and we'll see whether the ideas are viable. I expect that these types of problems will go away when we eventually start having richer (or at least better formatted) metadata in eBook files "born as XML," but with our current procedure there's a little too much variety in the eBook layout, author names/roles, and so forth to be able to create completely automatic catalog entries from the eBooks themselves. Thanks! -- Greg

"D." == D Starner <shalesller@writeme.com> writes:
D.> Greg Newby <gbnewby@pglaf.org> writes: >> Sorry about that, David. The reason is that the automatic >> cataloging program only picks up the metadata in the book >> header like Author:, Title:, etc. D.> So is this something that I should take up within DP? What D.> needs to be done that isn't? -- I think that something has rather to be done on PG end: include better metadata in the book header. Posting collects complete metadata, but only a tiny part finds its way to the top of the PG ebook. I don't see the point of collecting a lot of info, then discard it and include only a part. If it is felt to be intrusive at the top of the book, why do we not include complete metadata at the bottom, after the licence? The cataloguing program can search the bottom in addition/instead of the top. I would recommend searching the bottom, and if there is no metadata there, go to the top. This would not require changing anything in existing postings. Carlo

D. Starner wrote:
Greg Newby <gbnewby@pglaf.org> writes:
Sorry about that, David. The reason is that the automatic cataloging program only picks up the metadata in the book header like Author:, Title:, etc.
So is this something that I should take up within DP? What needs to be done that isn't?
We (WWs and me) have been discussing ways to fix the meta-data transfer between DP and the PG catalog. What we came up with is: - Put a unique identifier in the last line(s) of the text. This would allow the catalog database to query the database at DP for all missing info. or - Put a DC or XML/RDF metadata block at the end of the file. Example of DC metadata block: END OF THE PROJECT ... dc.author: Twain, Mark dc.title: 1601 dc.language: en dc.encoding: us-ascii dc.publisher: Project Gutenberg dc.rights: http://www.gutenberg.org/license pg.etext: 12345 pg.id: af04.bd32.1234.5678 EOF Example of RDF/XML metadata block: END OF THE PROJECT ... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:pg="http://www.gutenberg.org/pgrdf" xml:base="http://www.gutenberg.org/rdf/catalog.rdf"> <rdf:Description rdf:ID="etext13485"> <dc:publisher>Project Gutenberg</dc:publisher> <dc:title rdf:parseType="Literal">An Enquiry Concerning the Principles of Taste, and of the Origin of our Ideas of Beauty, etc.</dc:title> <dc:creator>Reynolds, Frances</dc:creator> <dc:contributor>Clifford, James L. [Contributor]</dc:contributor> <dc:language>en</dc:language> <dc:created>2004-09-17</dc:created> <dc:rights rdf:resource="http://www.gutenberg.org/license" /> <pg:identifier>0123.4567.89ab.cdef</pg:identifier> </rdf:Description> </rdf:RDF> EOF -- Marcello Perathoner webmaster@gutenberg.net

And it would be great to have the complete bibliographical record of the book (o books) used as source for the digital edition on every new text. Regards, Miguel A. Arévalo. El dom, 19-09-2004 a las 18:10 +0200, Marcello Perathoner escribió:
What we came up with is:
- Put a unique identifier in the last line(s) of the text. This would allow the catalog database to query the database at DP for all missing info.
or
- Put a DC or XML/RDF metadata block at the end of the file.

DP does generate a DC file for each project. I'm not entirely sure what's in it, although I presume that it captures the information that we collect, which is Title Author Language Genre (by our definition, not an official cataloging one) We do not collect information about publication dates, multiple authors or creative content roles, publisher, etc. In these regards the new PG clearance system collects much more information, and is probably much more accurate as well. Many project managers (including myself) tend to shorten or adjust the titles so that they fit better on the project listing page. Similarly with author/illustrator/editior/etc information. This is appropriate and useful for our internal purposes, but doesn't work well when mapped to anything external. All in all, I'd recommend using the information collected as part of the copyright clearance as a basis for cataloging. JulietS ----- Original Message ----- From: "Marcello Perathoner" <marcello@perathoner.de> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Sunday, September 19, 2004 12:10 PM Subject: Re: [gutvol-d] Indexing Editors, etc.
D. Starner wrote:
Greg Newby <gbnewby@pglaf.org> writes:
Sorry about that, David. The reason is that the automatic cataloging program only picks up the metadata in the book header like Author:, Title:, etc.
So is this something that I should take up within DP? What needs to be done that isn't?
We (WWs and me) have been discussing ways to fix the meta-data transfer between DP and the PG catalog.
What we came up with is:
- Put a unique identifier in the last line(s) of the text. This would allow the catalog database to query the database at DP for all missing info.
or
- Put a DC or XML/RDF metadata block at the end of the file.
Example of DC metadata block:
END OF THE PROJECT ...
dc.author: Twain, Mark dc.title: 1601 dc.language: en dc.encoding: us-ascii dc.publisher: Project Gutenberg dc.rights: http://www.gutenberg.org/license pg.etext: 12345 pg.id: af04.bd32.1234.5678
EOF
Example of RDF/XML metadata block:
END OF THE PROJECT ...
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:pg="http://www.gutenberg.org/pgrdf" xml:base="http://www.gutenberg.org/rdf/catalog.rdf">
<rdf:Description rdf:ID="etext13485"> <dc:publisher>Project Gutenberg</dc:publisher> <dc:title rdf:parseType="Literal">An Enquiry Concerning the Principles of Taste, and of the Origin of our Ideas of Beauty, etc.</dc:title> <dc:creator>Reynolds, Frances</dc:creator> <dc:contributor>Clifford, James L. [Contributor]</dc:contributor> <dc:language>en</dc:language> <dc:created>2004-09-17</dc:created> <dc:rights rdf:resource="http://www.gutenberg.org/license" /> <pg:identifier>0123.4567.89ab.cdef</pg:identifier> </rdf:Description>
</rdf:RDF>
EOF
-- Marcello Perathoner webmaster@gutenberg.net
_______________________________________________ gutvol-d mailing list gutvol-d@lYP5g.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
participants (6)
-
Carlo Traverso
-
D. Starner
-
Greg Newby
-
Juliet Sutherland
-
Marcello Perathoner
-
Miguel A. Arévalo