Re: Search by subject for ebooks?

Would it be possible for some clever script-writer out there to be able to set up something that would harvest LOC/AMICUS subject lines, based on inputting the Dewey or LOC call number or cat number?

Wallace J.McLean wrote:
Would it be possible for some clever script-writer out there to be able to set up something that would harvest LOC/AMICUS subject lines, based on inputting the Dewey or LOC call number or cat number?
To get subject tags from the LoC is relatively easy. But, first of all, you have to enter the LoC call number of the book. There are only 19,000 of them. Volunteers? -- Marcello Perathoner webmaster@gutenberg.org

On 8/13/06, Marcello Perathoner <marcello@perathoner.de> wrote:
Wallace J.McLean wrote:
Would it be possible for some clever script-writer out there to be able to set up something that would harvest LOC/AMICUS subject lines, based on inputting the Dewey or LOC call number or cat number?
To get subject tags from the LoC is relatively easy. But, first of all, you have to enter the LoC call number of the book.
There are only 19,000 of them. Volunteers?
First, Distributed Proofreaders usually records the LCCN upon creation of the project. If we import from them, that's probably a good 5,000 there. Secondly, whenever I hand import a book, I copy the category and stuff from the LOC by hand; it would be much easier if I could do it just by entering the LCCN. Every little bit helps, and I might be motivated to do a lot more if I could load it just by entering the LCCN. Thirdly, if we offered the ability to load information from the LoC at copy.pglaf.org like is done at DP, most new clearances would have the LCCN because it would be the most convenient way of clearing the book.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Does anyone have the link handy for looking up such things at the LOC? Sincerely Aaron Cannon - -- Skype: cannona MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: "David Starner" <prosfilaes@gmail.com> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Sunday, August 13, 2006 11:50 AM Subject: Re: [gutvol-d] Re: Search by subject for ebooks?
On 8/13/06, Marcello Perathoner <marcello@perathoner.de> wrote:
Wallace J.McLean wrote:
Would it be possible for some clever script-writer out there to be able to set up something that would harvest LOC/AMICUS subject lines, based on inputting the Dewey or LOC call number or cat number?
To get subject tags from the LoC is relatively easy. But, first of all, you have to enter the LoC call number of the book.
There are only 19,000 of them. Volunteers?
First, Distributed Proofreaders usually records the LCCN upon creation of the project. If we import from them, that's probably a good 5,000 there. Secondly, whenever I hand import a book, I copy the category and stuff from the LOC by hand; it would be much easier if I could do it just by entering the LCCN. Every little bit helps, and I might be motivated to do a lot more if I could load it just by entering the LCCN.
Thirdly, if we offered the ability to load information from the LoC at copy.pglaf.org like is done at DP, most new clearances would have the LCCN because it would be the most convenient way of clearing the book. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFE32LQI7J99hVZuJcRAvdTAJ4r10xCZG+CxsTE/vV/KA+hk5/24ACeK03G wFBBjC7AG6MuZeNaHYeW5iM= =sMe8 -----END PGP SIGNATURE-----

David Starner wrote:
First, Distributed Proofreaders usually records the LCCN upon creation of the project. If we import from them, that's probably a good 5,000 there.
It looks like about 1/3 of DP projects have a valid LCCN (that I could find, anyway). For the 8890 titles that PG has posted from DP, the fraction is slightly less. Here is the PG# -> LCCN mapping for 2238 of them: http://www.pgdp.net/noncvs/LCCN/map.txt -Michael

Michael Dyck wrote:
It looks like about 1/3 of DP projects have a valid LCCN (that I could find, anyway). For the 8890 titles that PG has posted from DP, the fraction is slightly less. Here is the PG# -> LCCN mapping for 2238 of them:
Not bad. I'll crash now but I'll import them tomorrow. -- Marcello Perathoner webmaster@gutenberg.org

[off-list] Marcello Perathoner wrote:
Michael Dyck wrote:
It looks like about 1/3 of DP projects have a valid LCCN (that I could find, anyway). For the 8890 titles that PG has posted from DP, the fraction is slightly less. Here is the PG# -> LCCN mapping for 2238 of them:
Not bad. I'll crash now but I'll import them tomorrow.
Actually, it turns out there are further archives that I didn't find, so I should have an improved version soonish. -Michael

I didn't expect to see this much conversation, but I'm glad to get such a variety of responses. Of course I would help with cataloging the books by subject. :) Jared Michael Dyck wrote on 13/08/2006, 8:00 PM:
[off-list]
Marcello Perathoner wrote:
Michael Dyck wrote:
It looks like about 1/3 of DP projects have a valid LCCN (that I could find, anyway). For the 8890 titles that PG has posted from DP, the fraction is slightly less. Here is the PG# -> LCCN mapping for 2238 of them:
Not bad. I'll crash now but I'll import them tomorrow.
Actually, it turns out there are further archives that I didn't find, so I should have an improved version soonish.
-Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- . .:. .:::. .:::::. ***.:::::::.*** *******.:::::::::.******* Dmitri Yalovsky ********.:::::::::::.******** ********.:::::::::::::.******** USS Authority *******.::::::'***`::::.******* ******.::::'*********`::.****** Asst. Chief of Engineering ****.:::'*************`:.**** *.::'*****************`.* .:' *************** . .

Not bad. I'll crash now but I'll import them tomorrow.
Marcello, did you get to import these? I checked just a couple, and didn't see LCCNs indicated on catalog pages. Michael: who makes these? We could collect them at upload time to percolate through the posting process quite easily. I would not mind adding them to the standard eBook header, if we have them in time and are reasonably confident of their accuracy. -- Greg On Tue, Aug 15, 2006 at 07:13:35PM -0700, Jared Buck wrote:
I didn't expect to see this much conversation, but I'm glad to get such a variety of responses. Of course I would help with cataloging the books by subject. :)
Jared
Michael Dyck wrote on 13/08/2006, 8:00 PM:
[off-list]
Marcello Perathoner wrote:
Michael Dyck wrote:
It looks like about 1/3 of DP projects have a valid LCCN (that I could find, anyway). For the 8890 titles that PG has posted from DP, the fraction is slightly less. Here is the PG# -> LCCN mapping for 2238 of them:
Not bad. I'll crash now but I'll import them tomorrow.
Actually, it turns out there are further archives that I didn't find, so I should have an improved version soonish.
-Michael

Greg Newby wrote:
Michael: who makes these?
As part of creating a project at DP, you can do an external catalog search, and select one of the results if it appears to match the book in question. We extract some info from the catalog record of the selected result, including the LCCN. -Michael

It should be noted that the LCCN derived from the external catalog search may or may not be an exact match for the actual edition that is being used. The author and title will be the same, or very close, but the publisher, year, etc may be different. I've also been faced with two seemingly identical editions, that match the one in my hand, but that have different LCCNs. Then it's a coin toss as to which to choose. Also, most of the non-English books, and even many of those in English, don't turn up any kind of match from LoC (the only catalog we currently search). JulietS Michael Dyck wrote:
Greg Newby wrote:
Michael: who makes these?
As part of creating a project at DP, you can do an external catalog search, and select one of the results if it appears to match the book in question. We extract some info from the catalog record of the selected result, including the LCCN.
-Michael

Ok, is one way to do it make the whole thing a DP project then? List all books, have all the volunteers fill in subjects for the ones they know into some standard file that becomes machine readable, then has a checking process like other stuff, and books are removed from the list once done? Quoting David Starner <prosfilaes@gmail.com>:
First, Distributed Proofreaders usually records the LCCN upon creation of the project. If we import from them, that's probably a good 5,000 there. Secondly, whenever I hand import a book, I copy the category and stuff from the LOC by hand; it would be much easier if I could do it just by entering the LCCN. Every little bit helps, and I might be motivated to do a lot more if I could load it just by entering the LCCN.
Thirdly, if we offered the ability to load information from the LoC at copy.pglaf.org like is done at DP, most new clearances would have the LCCN because it would be the most convenient way of clearing the book. _______________________________________________
------------------------------------------------------------ This email was sent from Netspace Webmail: http://www.netspace.net.au

rnmscott@netspace.net.au wrote:
Ok, is one way to do it make the whole thing a DP project then?
List all books, have all the volunteers fill in subjects for the ones they know into some standard file that becomes machine readable, then has a checking process like other stuff, and books are removed from the list once done?
Don't forget that some books will fit under multiple categories. Think along the lines of "The Klingon Singing Cook Book." I don't know if such a thing exists, but you get the idea. And you will want to standardize the categories. Otherwise you will have "cook book", "cook books" "cooking books", "cajun cooking books", "coook book", etc.

On Sun, 13 Aug 2006 15:17:44 +0200, Marcello Perathoner <marcello@perathoner.de> wrote: |Wallace J.McLean wrote: | |> Would it be possible for some clever script-writer out there to be able |> to set up something that would harvest LOC/AMICUS subject lines, based |> on inputting the Dewey or LOC call number or cat number? | |To get subject tags from the LoC is relatively easy. But, first of all, |you have to enter the LoC call number of the book. | |There are only 19,000 of them. Volunteers? Lots of books, especially the ones I do, are not in LoC or even British Library -- Dave Fawthrop <dave hyphenologist co uk> "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain

We have the classic PG conundrum: In response to a suggestion to make a change, someone helpfully (sarcasm, people!) indicates the enormity of the task and asks if the person making the suggestion is willing to single-handedly implement it or to raise tens of thousands of dollars to hire one or more professionals to "do it right." Another responds with a willingness to participate in an aspect of the solution, but only after someone else gets the ball rolling. Another person reminds us that Distributed Proofreaders has already collected the data to provide a partial solution; we only need to create a mechanism to bring their data over to the "parent" site. Finally, someone pipes up that some material provided to DP is so rare that the only records of the material even being created are buried with some defrocked monk who drowned off the coast of Antigua under mysterious circumstances. And someone else will contribute yet another dead horse for us to beat. Come on, people. There is no magic wand to provide a complete instant solution to this issue. There is also nothing wrong for multiple partial solutions. If someone is really excited about petunias, let that person create a petunia page. If the LOC has an official subject category for petunias (I don't know, Science - Botany - Perennial Plants - North America - Petunias), then let's link things that way too. We have Distributed Proofreaders. Do we need Distributed Catalogers? I would be willing to read a book and tell you what categories would be significant to me. I am not a professional cataloger, but I have used a library before and I have some concept of a subject index. The original post was along the lines of "it would be nice if we could do this." Yes, lots of things would be nice and not every nice thing deserves to be done. However, if we fancy ourselves as a library, is not a subject index part of the catalog? Let the flames begin.
participants (12)
-
Aaron Cannon
-
Dave Fawthrop
-
David Starner
-
Greg Newby
-
Jared Buck
-
John Hagerson
-
Juliet Sutherland
-
Kevin Handy
-
Marcello Perathoner
-
Michael Dyck
-
rnmscott@netspace.net.au
-
Wallace J.McLean