
== Resent message; It was bounced the first time == On Tue, 9 Nov 2004, Steve Thomas wrote:
This was all well and good, and eventually we ended up with around 3,800 records for PG titles in our catalogue.
However, the advent of DP put paid to all that. The volume of works appearing each month very quickly overwhelmed me, and I was forced to abandon the effort, so that an unfortunate side effect of DP was that I could no longer add MARC records to our catalogue.
I believe something like this is also faced by John Mark Ockerbloom, who maintains the Online Books page. He has cataloged a large portion of PG, as well as thousands of online books from other sources. However, as you say, one person cannot keep up with the increasing number of old books being digitized.
I believe that recent changes and enhancements to the PG archive may make a similar effort possible once more. First, I am told that there is now an XML file of the PG database, and that this contains much more and better detail than the old GUTINDEX list.
I would qualify this with a "yes, but..." Yes, this does exist (see the link Greg gave, or here's a link directly to the compressed rdf file: http://www.gutenberg.org/feeds/catalog.rdf.bz2) But, as is PG custom, it has its own inconsistancies. All new records are generated automatically from information in the headers of newly posted files (and this is not always accurate) Many older records were copied from the old catalog from promo.net, which sometimes had "interesting" variations. Many records have additional information such as subject headings LOC classifications and sometimes other material of bibliographical interest in a "notes" field. But many records have only very basic information. Additional information is generally added when one of the volunteers who has write access to the catalog takes an interest in looking it up. So this happens somewhat irregularly. Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes.
Second, PG now has a neater way of accessing texts, using a simple URL like http://www.gutenberg.org/etext/1234 Previously, one could only link directly to the individual files in the archive, and this complicated matters, since every title has at least two files (.txt and .zip) and often there are multiple versions and formats.
Yes. In my own opinion, the ability to do this is perhaps the best thing to have happened for PG in the last year. This provides a much more ideal way to link to a PG title from any place such as newsgroups, websites, catalogs, whatever. (Thanks Marcello!) This also makes it easier to present selections from PG, organized by whatever criteria you choose. (eg, Marcello's list of "Top 100" downloads, my list of Canadiana.) All of this only encourages more exposure for PG, and a greater chance that some computer user will come across (perhaps by accident) a PG text that interests him.
Of course, one has to ask whether the effort of creating and *maintaining* catalogue records for PG is worth while. We live in the age of Google, and it is a lament frequently heard from librarians that the user is more often likely to search the 'net with Google than to use the Library catalogue.
I believe the effort is worth while. Good cataloging can lead to a user finding an item of interest that may have been missed otherwise. And yes, google does index the PG "bibrec" pages, so any additional work done in cataloging could possibly lead to a text being found from someone searching with google.
However, redundancy is no bad thing with information, and the more ways of getting at it the better -- so long as those ways remain accurate. So I believe many libraries would welcome the chance to load marc records pointing at PG texts -- provided that they can be sure the record contents are accurate and the links remain so.
At this point in time, I would say a good deal of manual tweaking would be needed to get a result that would be somewhat satisfactory for librarians. Links should not be a problem, as the canonical URLs discussed above show every sign of being much more permanent than most. Andrew