Re:[gutvol-d] [BP] The Future of eBooks

== Resent message; It was bounced the first time == On Tue, 9 Nov 2004, Steve Thomas wrote:
This was all well and good, and eventually we ended up with around 3,800 records for PG titles in our catalogue.
However, the advent of DP put paid to all that. The volume of works appearing each month very quickly overwhelmed me, and I was forced to abandon the effort, so that an unfortunate side effect of DP was that I could no longer add MARC records to our catalogue.
I believe something like this is also faced by John Mark Ockerbloom, who maintains the Online Books page. He has cataloged a large portion of PG, as well as thousands of online books from other sources. However, as you say, one person cannot keep up with the increasing number of old books being digitized.
I believe that recent changes and enhancements to the PG archive may make a similar effort possible once more. First, I am told that there is now an XML file of the PG database, and that this contains much more and better detail than the old GUTINDEX list.
I would qualify this with a "yes, but..." Yes, this does exist (see the link Greg gave, or here's a link directly to the compressed rdf file: http://www.gutenberg.org/feeds/catalog.rdf.bz2) But, as is PG custom, it has its own inconsistancies. All new records are generated automatically from information in the headers of newly posted files (and this is not always accurate) Many older records were copied from the old catalog from promo.net, which sometimes had "interesting" variations. Many records have additional information such as subject headings LOC classifications and sometimes other material of bibliographical interest in a "notes" field. But many records have only very basic information. Additional information is generally added when one of the volunteers who has write access to the catalog takes an interest in looking it up. So this happens somewhat irregularly. Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes.
Second, PG now has a neater way of accessing texts, using a simple URL like http://www.gutenberg.org/etext/1234 Previously, one could only link directly to the individual files in the archive, and this complicated matters, since every title has at least two files (.txt and .zip) and often there are multiple versions and formats.
Yes. In my own opinion, the ability to do this is perhaps the best thing to have happened for PG in the last year. This provides a much more ideal way to link to a PG title from any place such as newsgroups, websites, catalogs, whatever. (Thanks Marcello!) This also makes it easier to present selections from PG, organized by whatever criteria you choose. (eg, Marcello's list of "Top 100" downloads, my list of Canadiana.) All of this only encourages more exposure for PG, and a greater chance that some computer user will come across (perhaps by accident) a PG text that interests him.
Of course, one has to ask whether the effort of creating and *maintaining* catalogue records for PG is worth while. We live in the age of Google, and it is a lament frequently heard from librarians that the user is more often likely to search the 'net with Google than to use the Library catalogue.
I believe the effort is worth while. Good cataloging can lead to a user finding an item of interest that may have been missed otherwise. And yes, google does index the PG "bibrec" pages, so any additional work done in cataloging could possibly lead to a text being found from someone searching with google.
However, redundancy is no bad thing with information, and the more ways of getting at it the better -- so long as those ways remain accurate. So I believe many libraries would welcome the chance to load marc records pointing at PG texts -- provided that they can be sure the record contents are accurate and the links remain so.
At this point in time, I would say a good deal of manual tweaking would be needed to get a result that would be somewhat satisfactory for librarians. Links should not be a problem, as the canonical URLs discussed above show every sign of being much more permanent than most. Andrew

Andrew Sly wrote:
Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes.
The design philosophy of the catalog database is: To help people find a book they may want to read. That includes both, people who already know which book they want and people who want a suggestion. The catalog database was not designed to be a tool for professionals. But this doesn't mean that I'm not willing to add some functions to help them out, so long as those functions don't get in the way of the primary functionality. Producing MARC records out of existing catalog entries seems to be a pretty forward thing. Importing other people's MARC into our database will be much hairier. -- Marcello Perathoner webmaster@gutenberg.org

At 11:06 AM 11/9/2004, you wrote:
Andrew Sly wrote:
Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes.
The design philosophy of the catalog database is:
To help people find a book they may want to read.
That includes both, people who already know which book they want and people who want a suggestion.
The catalog database was not designed to be a tool for professionals. But this doesn't mean that I'm not willing to add some functions to help them out, so long as those functions don't get in the way of the primary functionality.
Producing MARC records out of existing catalog entries seems to be a pretty forward thing.
Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole thing would be in place by now. On the other hand, PG database may not be capable of the Z39.50 imports but there are many MANY (if not all!) library cataloging software packages that will do it in a short time. The advantage of importing from the existing catalog entries is that we have our pick of what fits our needs for especially the subject fields. Of course there is always work to edit and customize them for the PG user database. I don't see why we can't have a commercial software to do most of the work and keep the existing catalog as a backup. And for the record, I have been involved in the PG cataloging effort for more than six years and anyone who says I am not interested in it any more is clearly not aware of the full facts. It may be quite disappointing when one's years of volunteer efforts have been deleted with the "new improvements"! Alev. an "official" librarian
Importing other people's MARC into our database will be much hairier.
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004

I'm new around here so please forgive me if I go over old ground. I have subscribed to the RSS Recently Posted or Updated feeds and it is truely amazing to see the way the entries roll in every night. However, it is frustrating to see when one of the entries is opened up there is little information apart from the author (with or without dates) and a title. In most cases I have no idea what the book is about and whether I am interested in it. I, and I am sure many others would love to see a bit more detail such as the original date of publication and a brief synopsis of the work. Obviously to enter such information day after day with such a rush of material is far beyond the resources of a small group of volunteers, however, dedicated. Would it not be possible to devise a distributed cataloguing system followng along the model of DP. For each book "in the frame" a form would be provided with spaces for the required items. When these were completed (and checked) the data would then be transferred, in an agreed format--MARC or otherwise,--to a file held within the books directory tree. In many cases this information is provided at the time of proofreadng and then it seems to be lost. Obviously some of the infomation might be easy to complete such as book or serial. However other fields might need research such as key dates, author bio etc. Also a meaningful synopsis would mean most likely reading the text or abstracting a portion from another work. I could also see that multilingual versions might be needed. I would think there are many who would rise to the challenge of helping in such an endevour, Lynne On Tuesday 09 November 2004 12:26 pm, Alev Akman wrote:
At 11:06 AM 11/9/2004, you wrote:
Andrew Sly wrote:
Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes.
The design philosophy of the catalog database is:
To help people find a book they may want to read.
That includes both, people who already know which book they want and people who want a suggestion.
The catalog database was not designed to be a tool for professionals. But this doesn't mean that I'm not willing to add some functions to help them out, so long as those functions don't get in the way of the primary functionality.
Producing MARC records out of existing catalog entries seems to be a pretty forward thing.
Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole thing would be in place by now.
On the other hand, PG database may not be capable of the Z39.50 imports but there are many MANY (if not all!) library cataloging software packages that will do it in a short time. The advantage of importing from the existing catalog entries is that we have our pick of what fits our needs for especially the subject fields. Of course there is always work to edit and customize them for the PG user database.
I don't see why we can't have a commercial software to do most of the work and keep the existing catalog as a backup.
And for the record, I have been involved in the PG cataloging effort for more than six years and anyone who says I am not interested in it any more is clearly not aware of the full facts. It may be quite disappointing when one's years of volunteer efforts have been deleted with the "new improvements"!
Alev. an "official" librarian
Importing other people's MARC into our database will be much hairier.
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004

"Lynne" == Lynne Anne Rhodes <lynne@rhodesresearch.biz> writes:
Lynne> I, and I am sure many others would love to see a bit more Lynne> detail such as the original date of publication and a brief Lynne> synopsis of the work. Obviously to enter such information Lynne> day after day with such a rush of material is far beyond Lynne> the resources of a small group of volunteers, however, Lynne> dedicated. DP would be delighted of preserving these data. Most books that pass through DP are accompanied by a small html page that describes the author, the book, etc; and the data on the original book are preserved in proofreading, and often deleted in post-processing. We have also discussed keeping a catalogue of our books, with this kind of additional information. One of the problems is copyright: most of the info on the author are taken from sources that could not resist a clearance procedure (i.e. are raided from other sites). So this cannot be integrate with the PG catalogue; but might build the core of an added-value site that maintains a PG catalogue adding information and classification data. The PG catalogue remains authoritative and terse, but you can get additional features. Exactly as with many etexts, for which sites exist that add formats for PG ebooks. The first step however is to have better PG records, and a method to avoid losing information from DP to the PG catalogue. Carlo

Carlo Traverso wrote:
The first step however is to have better PG records, and a method to avoid losing information from DP to the PG catalogue.
If you put a complete <teiHeader> ... </teiHeader> somewhere in the files, maybe at the back where it won't hurt much, I can easily pick it out and parse it into the database. Of course it has to stay in the file after being posted. What is happening now is that I parse the tiny header at the top of the file and I get just what's there. -- Marcello Perathoner webmaster@gutenberg.org

The central problem, if I've understood all the posts, is that the catalog entry is generated from the final header, which as we all know omits lots of detail which the volunteers have. Would it be possible to add manual cataloguing to the posting workflow? By which I mean, when a person (whitewasher?) posts a new text, they also edit the catalog to add whatever level of detail for the work is to hand. I understand that we don't want to add to the whitewasher's workload, but -- thanks to Marcello's web interface -- it is really quite easy to add to a catalog entry, so probably not a great deal of work in comparison to the work they already do. Of course, once all the TEI stuff is in place, this won't be necessary, but in the meantime ... Steve Marcello Perathoner wrote:
Carlo Traverso wrote:
The first step however is to have better PG records, and a method to avoid losing information from DP to the PG catalogue.
If you put a complete <teiHeader> ... </teiHeader> somewhere in the files, maybe at the back where it won't hurt much, I can easily pick it out and parse it into the database. Of course it has to stay in the file after being posted.
What is happening now is that I parse the tiny header at the top of the file and I get just what's there.
-- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

Hi Steve. The problem with this proposition is that at the time a whitewasher is working on the final posting of a text, there is no catalog record to edit yet. New records are only generated once a day, when the directories are automatcally scanned to find any new files. Also, I do have the impression that the whitewashers would rather not deal with cataloging issues. (where a small change can suddenly require further following up in order to keep the catalog somewhat consistent, deal with further issues, etc.) As the closest thing we have to a "Catalog content supervisor" I will volunteer to work with additional information if we can find some way to get it to me--preferrably via catalog[at]pglaf.org--from the people producing the texts. And I must add here that simply having a tei template in place will not remove the advisability of still manually looking through every record. With the amount of less-than-ideal modifications that can creep in when just dealing with a Title and Author, I can only think I would see more if more fields are included. Andrew On Thu, 11 Nov 2004, Steve Thomas wrote:
The central problem, if I've understood all the posts, is that the catalog entry is generated from the final header, which as we all know omits lots of detail which the volunteers have.
Would it be possible to add manual cataloguing to the posting workflow? By which I mean, when a person (whitewasher?) posts a new text, they also edit the catalog to add whatever level of detail for the work is to hand.
I understand that we don't want to add to the whitewasher's workload, but -- thanks to Marcello's web interface -- it is really quite easy to add to a catalog entry, so probably not a great deal of work in comparison to the work they already do.
Of course, once all the TEI stuff is in place, this won't be necessary, but in the meantime ...
Steve

Alev Akman wrote:
Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole thing would be in place by now.
Nobody requested that feature before. And, to be exact, nobody is requesting that feature now. Its just some of us *think* that libraries could use that. As a rule, I'm not putting work into features that maybe nobody will use.
On the other hand, PG database may not be capable of the Z39.50 imports but there are many MANY (if not all!) library cataloging software packages that will do it in a short time. The advantage of importing from the existing catalog entries is that we have our pick of what fits our needs for especially the subject fields. Of course there is always work to edit and customize them for the PG user database.
I don't see why we can't have a commercial software to do most of the work and keep the existing catalog as a backup.
- Does it provide web access for users? - For catalogers? - How much will an unlimited worldwide public access license cost? - Will it run on Linux/Apache? - Will it manage our files? - Will it provide download links for the files? - Do we get the source code to adapt it to our particular needs? I think any commercial library-use-oriented catalog software will fall far short of what we have now. We don't need so much of a catalog system. What we need is a web shop system à la Amazon. But I have my doubts they will give us theirs. The problems with MARC are: - the standard is not free. - the records are not free. - the technology is obsolete I don't know what the copyright status of the LoC MARC records is. They are an US government agency, so they should be free. But do we know? To request a MARC record I have to implement an obscure Z39.50 protocol. And I get back a record full of numeric codes that I have to look up before knowing what they are. Why can't I simply post a HTTP request and get an XML/RDF answer? Which MARC record should we import for a book. If you search thru the LoC catalog you'll find many examples of works that have got different MARC subject classifications for the different copies held by the LoC. LoC class codes have shifted semantically over the years. What was XY in 1970 will not necessarily be XY in 2000. So you'll have to keep the LoC class code, the year the classification was made and the list of class codes that was authoritative in that year. Of course same goes for Dewey etc.
And for the record, I have been involved in the PG cataloging effort for more than six years and anyone who says I am not interested in it any more is clearly not aware of the full facts.
I didn't say that. I said Greg and me wanted to get you as manager of the catalog team but last time I mailed Greg about it he said he got no answer from you. Your last post on this list was on 3/18.
It may be quite disappointing when one's years of volunteer efforts have been deleted with the "new improvements"!
I don't know of any data that has willfully been deleted. Please give an example. -- Marcello Perathoner webmaster@gutenberg.org

Marcello Perathoner <marcello@perathoner.de> writes:
To request a MARC record I have to implement an obscure Z39.50 protocol.
You can use yaz-client as it comes with the YAZ toolkit (http://www.indexdata.dk/yaz/). Index Data also offers a database system: http://www.indexdata.dk/zebra/ (GPL). -- Key fingerprint = B2A3 AF2F CFC8 40B1 67EA 475A 5903 A21B 06EB 882E

Karl Eichwalder wrote:
You can use yaz-client as it comes with the YAZ toolkit (http://www.indexdata.dk/yaz/).
Ok. I got so far. For all of you that wondered what a MARC record looks like here is an example: 000 01109cam 2200277 a 4500 001 708964 005 19980710092633.8 008 970604s1997 inuab b 001 0 eng 035 $9(DLC) 97023698 906 $a7$bcbc$corignew$d1$eocip$f19$gy-gencatlg 955 $apc16 to ja00 06-04-97; jd25 06-05-97; jd99 06-05-97; jd11 06-06-97;aa05 06-10-97; CIP ver. pv08 11-05-97 010 $a 97023698 020 $a0253333490 (alk. paper) 040 $aDLC$cDLC$dDLC 050 00 $aQE862.D5$bC697 1997 082 00 $a567.9$221 245 04 $aThe complete dinosaur /$cedited by James O. Farlow and M.K. Brett-Surman ; art editor, Robert F. Walters. 260 $aBloomington :$bIndiana University Press,$cc1997. 300 $axi, 752 p. :$bill. (some col.), maps ;$c26 cm. 504 $aIncludes bibliographical references and index. 650 0 $aDinosaurs. 700 1 $aFarlow, James Orville. 700 2 $aBrett-Surman, M. K.,$d1950- 920 $a**LC HAS REQ'D # OF SHELF COPIES** 991 $bc-GenColl$hQE862.D5$iC697 1997$tCopy 1$wBOOKS 991 $br-SciRR$hQE862.D5$iC697 1997$tCopy 1$wGenBib bi 98-003434 The first problem is: how do we relate existing and new books to LoC MARC records. Meaning: we have to find out the Control Number (001) or the LoC Control Number (010) of every book we have. We need a few volunteers to build a list: etext-number => Control Number. Then we can import that list into the database. -- Marcello Perathoner webmaster@gutenberg.org

At 08:06 PM 11/9/2004 +0100, you wrote:
The design philosophy of the catalog database is:
To help people find a book they may want to read.
That includes both, people who already know which book they want and people who want a suggestion.
Hello list. Sorry if I seem to be complaining, but I must say that I find the current PG catalog to be mostly useless. I should qualify that. I can easily search through GUTINDEX.ALL to find a certain title or author. I've found that grep works great for that. However, there are no clues anywhere that tell me what a book is about, whether it's mystery, drama, nonfiction or something else, or even a basic subject classification. I admit that some of this might be found by using the search form or the gutenberg.org/etext1234 url, but from the standpoint of a user who is in a hurry and just wants something to read it's still inconvenient. Let's pick a random example of something which has been recently discussed. http://gutenberg.org/etext/1473 First, the link for in-depth information takes you to the volunteer pages. This is misleading since it looks like I would be able to find more information on the book. More than once I have followed that link only to find myself in the wrong place and I had to go back in my browser. Second, let's look at the subject. All it says is "fiction." OK, but about what? What category of fiction? While bookshare.org has a catalog not designed for professionals either, most books have a synopsis and are sorted by category. I have a possible suggestion for solving part of this. Put something in the newsletter asking people who read PG etexts to write summaries of them and categorize them. Somehow create a form which only allows books to be reviewed or summarized, maybe like a wiki but more confined. Someone would still manually approve the summary ("good" isn't helpful) and add it to the catalog. That would at least give the end user some idea of what a book is about first. Just for clarity, I would suggest that this summary, synopsis, categorization etc. would show up on the etext/1234 page and be added to the rdf feed but not appear in GUTINDEX.ALL.

Tony Baechler wrote:
Second, let's look at the subject. All it says is "fiction." OK, but about what? What category of fiction? While bookshare.org has a catalog not designed for professionals either, most books have a synopsis and are sorted by category.
Everybody is complaining about the missing subject information. Complaining won't help. Stepping up and volunteering to enter the data would help. -- Marcello Perathoner webmaster@gutenberg.org

At 08:44 AM 11/10/2004, you wrote:
Tony Baechler wrote:
Second, let's look at the subject. All it says is "fiction." OK, but about what? What category of fiction? While bookshare.org has a catalog not designed for professionals either, most books have a synopsis and are sorted by category.
Everybody is complaining about the missing subject information.
Complaining won't help. Stepping up and volunteering to enter the data would help.
Maybe if the computer people stuck to "computering" and listened to how the library world does it? After all, the library sytems and conventions have been in place for a while. And, Marcello, my dear, don't give me that line about not having been on the list since 3/18. Just because I don't believe in the diarrhea of the mouth like some people we know : ) does not mean I do not care! It would be good if the people who know the technical side would listen to library requirements (whether _they_ think MARC records are needed, or not!) once in a while. Otherwise, PG will be sentenced to being a whoever, whatever kind of project. Alev.
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004

On Wed, 10 Nov 2004, Marcello Perathoner wrote:
Everybody is complaining about the missing subject information.
Complaining won't help. Stepping up and volunteering to enter the data would help.
I don't believe we are ready. There is right now no agreement about what form this data would take, or what standard to try to comply with. If various volunteers all get to enter their own idea of what catagories and subject headings appeal to them, we will end up with a mish-mash of conflicting and overlapping data. I am no expert here, but I have read enough to know that doing subject cataloging _well_ is more involved most people realise. Andrew

Andrew Sly wrote:
I don't believe we are ready. There is right now no agreement about what form this data would take, or what standard to try to comply with.
If various volunteers all get to enter their own idea of what catagories and subject headings appeal to them, we will end up with a mish-mash of conflicting and overlapping data.
I am no expert here, but I have read enough to know that doing subject cataloging _well_ is more involved most people realise.
Yes indeed. Library systems use what's known as an authority file for subject headings (and also for authors). This lists only headings that are "authorised" -- e.g. for LCSH, conform to the LCSH standards. Now, PG is *never* going to have such a file (it would be huge) and I don't think it should -- LCSH is famously arcane and often seems rather arbitrary. (Although there are teams of librarians working day and night in a dark tower somewhere making sure that only the "correct" terms are used. ;-) Ideally though, there should be some guidelines about what terms should be used in the subject field, otherwise it will be less than useful. For example, if we are going to apply the term "Fiction" to some works of fiction, then it should be applied to all. Otherwise, it's usefulness as a search term is diminished. The key problem is one of scale. Do you limit the field to a short list of valid terms ("fiction", "history", ...) and risk them being too broad to be useful, or do you allow a longer list with greater precision, and risk the list being too long to be manageable? Sorry, I don't have an answer to that. Needs debate. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

(Yes, there is a mailing list for discussing cataloging issues, but it seems to have very little traffic, and I feel I may have a better chance of sharing my ideas with people here.) On Thu, 11 Nov 2004, Steve Thomas wrote:
The key problem is one of scale. Do you limit the field to a short list of valid terms ("fiction", "history", ...) and risk them being too broad to be useful, or do you allow a longer list with greater precision, and risk the list being too long to be manageable?
Sorry, I don't have an answer to that. Needs debate.
I don't have an answer either. So I'll ask a question: Is it possible to have both large and small scale? Here is one possible way that could be approached: In the recent discussion on this list regarding cataloging, I've seen mention of different things that I might label genre, form and subject. Genre would be examples such as Science Fiction, Mystery, Historical Fiction, etc. Form would be examples such as novel, essays, drama, poetry, short stories, etc., as Steve mentioned is coded in MARC 008 field. Subject would be the subject headings one could find in a traditional library's catalog. For example: Legends--British Columbia--Vancouver We already have some examples creeping into the PG catalog of trying to cover all of these in the Subject field. (ie a collection of poems with "Subject: Poetry". This should be used for a book which is _about_ poetry, not one which merely contains poetry.) All three of these divisions could really be of great use to people using the catalog; however, having enough volunteer effort to have them consistently entered is of course a sticking point. Andrew

On Wed, Nov 10, 2004 at 05:44:37PM +0100, Marcello Perathoner wrote:
Tony Baechler wrote:
Second, let's look at the subject. All it says is "fiction." OK, but about what? What category of fiction? While bookshare.org has a catalog not designed for professionals either, most books have a synopsis and are sorted by category.
Everybody is complaining about the missing subject information.
Complaining won't help. Stepping up and volunteering to enter the data would help.
It's a little more complicated than that. I'll send a few messages more about this in a few minutes. The basic story is that the FIRST approach to cataloging our stuff will be "copy" cataloging. This includes adding subject terms, as well as regularizing the titles, authors and other data. This involves finding an existing catalog record in MARC format via OCLC or similar resources. Alev thinks this is possible for the majority of our works, even the very obscure ones and non-US items. The SECOND approach will be original cataloging, to create a record from scratch (or based on existing info like author records). This is something we'd like to do only when necessary. In either case, adding a new record requires looking at consistency with other records and other uses of the subject information, because these things tend to change over time. My view is that we will be able to get a corps of "distributed catalogers" to work on the first approach, though just as with distributed proofreaders, there will probably be different levels at which people feel comfortable/confident/competent in creating or changing records. ** I'll send some further info about how this could get underway. ** At some point soon, though, let's move this to the "gutcat" ** list. http://lists.pglaf.org to join -- Greg

Something very similar to this has been attempted before, with rather dismal results. Hardly anyone seemed interested in writing a little synopsis (or "blurb") On a few records in the online catalog, you will see a link labled "Reviews" which contain these. Many of them are actually only brief excerpts from the text in question. Andrew On Wed, 10 Nov 2004, Tony Baechler wrote:
I have a possible suggestion for solving part of this. Put something in the newsletter asking people who read PG etexts to write summaries of them and categorize them. Somehow create a form which only allows books to be reviewed or summarized, maybe like a wiki but more confined. Someone would still manually approve the summary ("good" isn't helpful) and add it to the catalog. That would at least give the end user some idea of what a book is about first. Just for clarity, I would suggest that this summary, synopsis, categorization etc. would show up on the etext/1234 page and be added to the rdf feed but not appear in GUTINDEX.ALL.
participants (9)
-
Alev Akman
-
Andrew Sly
-
Carlo Traverso
-
Greg Newby
-
Karl Eichwalder
-
Lynne Anne Rhodes
-
Marcello Perathoner
-
Steve Thomas
-
Tony Baechler