MARC to the catalog

Here's some information to try to get subject cataloging moving forward. As you've seen, Alev (who cataloged our first 3500 or so books) has stepped up to try to help shape this project. Andrew Sly has also stepped up, and has already been doing a lot of editing of existing catalog data. (I'm sending this to gutvol-d, but hope we can soon take this conversation to gutcat@lists.pglaf.org. Visit http://lists.pglaf.org to subscribe) One of our goals is to get proper subject headings into the Project Gutenberg catalog. ("Proper" means that they come from the Library of Congress Subject Headings corpus or similar authoritative source, and were generated by librarians or similarly clued-in people.) Currently, less than 1/4 of the Project Gutenberg collection has subject headings. Furthermore, the names we use for authors and titles are not always consistent. There are other limitations with the current catalog data, too. This message is partially to let people know how I think we'd like to start, and partially to ask Marcello and others (like Steve Thomas) to look at what it will take to move things forward. The basic scenario is that the easiest way to get authoritative catalog data (including subject headings) for our holdings is to find existing library catalog entries. There are some great resources and software for doing this, and a data interchange format called MARC. MARC stands for Machine Readable Cataloging, and it actually has a few variations. Essentially, it's delimited fields with data about an item: author, title, etc. (Many, many fields and subfields - most of which are not needed for a particular item.) ** What I'd like to enable is import of MARC records to the catalog, to update, augment or replace existing catalog entry for a particular item. This is harder than it might sound, for a variety of reasons. I'm appending a few MARC records from some recently released PG titles that Alev was able to find (yes, she found existing catalog records for these items, even though some are obscure and non-English). I don't want to over-specify how I think the workflow should happen. I think that's still to be determined. But the overall flow needs to be somewhat circular: librarians need to import existing PG catalog records, preferably in MARC format, to existing software. (Alev has a couple of programs for this; PGLAF can probably acquire software for other folks who'd like to work a lot on this activity.) Then, updated records would need to be shipped back into the catalog. Below are some MARC records, also the listing of info in the PG catalog and our clearance records (which are incomplete - though we usually do have the page scans sent for clearance) -- Greg One format: Title Author Date Publisher Note Punch, or, The London Charivari. Punch (London, England) 1841 Published for the proprietors by R. Bryant "No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover. Lippincott's magazine of popular literature and science (none) 1871; 1871-1880 J.B. Lippincott and Co. Edited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling. In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles. Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine. After a short time, however, it was merged with Scribner's; Title from caption.; Microfilm. The Lady of the Lake Scott, Walter; Rolfe, W. J. 1922 Houghton Mifflin company (none) The authoritative life of General William Booth Railton, George Scott 1912; c1912 Hodder & Stoughton, George H. Doran company (none) Camp and trail Hornibrook, Isabel Katherine 1897 Lothrop publishing company (none) The Outdoor Girls in army service, or, Doing their bit for the soldier boys Hope, Laura Lee. 1918; c1918 Grosset & Dunlap (none) Grace Harlowe's second year at Overton College Flower, Jessie Graham. 1914; c1914 Henry Altemus (none) Les trois mousquetaires Dumas, Alexandre; Le Courrier des États-Unis 1846 P. Gaillardet At head of title: Semaine littéraire du Courrier des États-Unis. George Borrow Thomas, Edward. 1912 Chapman & Hall, ltd. "Bibliography of George Borrow": p.[323]-333. Another: 00796nam 2200217 a 45M0001001300000003000400013005001700017008004100034040001300075043001200088050002000100130002800120245003700148246002100185260006500206300003300271500008500304650004000389650003900429852011000468NYUb11968217NYU19990310183125.0990310s1841 enka 000 0 eng d aNNUcNNU ae-uk-en 4aAP101b.P8 18410 aPunch (London, England)10aPunch, or, The London Charivari.30aLondon Charivari aLondon :bPublished for the proprietors by R. Bryant,c1841. a14, [2] p. :bill. ;c30 cm. a"No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover. 0aEnglish wit and humorvPeriodicals. 0aPopular literaturezGreat Britain. aNNUbNYUbBobstbSpecColhAP101i.P8 1841712081221mNon-circulatingpN10964924t1yAvailable3no.15no.102256cas 2200373 a 45M0001001300000003000400013005001700017007001400034008004100048010001600089035002600105035001800131040002700149042000800176090005100184245007400235246002600309260005700335300002800392310001200420362004900432500069900481500002401180533015201204760004201356776006001398780006301458785002601521830005401547866007901601950001001680998010101690852009101791NYUb10726168NYU19940713181853.0hduafu---buca890810d18711880miuuu p a 0uuua0eng d asn 85060910 a(CStRLIN)NYUG89-S4496 aGLIS007261686 aOAkUcOAkUdNdMHdNNU alcd i06/03/93 Th10/26/92 Th09/19/89 Th08/10/89 T00aLippincott's magazine of popular literature and scienceh[microform].14aLippincott's magazine aPhiladelphia :bJ.B. Lippincott and Co.,c1871-1880. a20 v. :bill. ;c28 cm. aMonthly0 aVol. 7, no. 1 (Jan. 1871)-v. 26 (Dec. 1880). aEdited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling. In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles. Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine. After a short time, however, it was merged with Scribner'szCf. American periodicals, 1741-1900. aTitle from caption. aMicrofilm.bAnn Arbor, Mich. :cXerox University Microfilms,d1972.e8 microfilm reels ; 4 in., 35 mm.f(American periodicals, 1850-1900 ; 317-324)0 tAmerican periodical series, 1850-19001 tLippincott's magazine of popular literature and science00tLippincott's magazine of literature, science and education00tLippincott's magazine 0aAmerican periodical series, 1850-1900 ;v317-324. lBobst Microform dFilm 277 APS III R317-322e8908f0g5hj7-26k1871-1880 lBMICR a06/03/93tcs9110nNNUwDCLCSF8999097Sd08/10/89cMJDbSKHi930603h921026h890919h890810lNYUG aNNUbNYUbBobstbMicroform711635364mNon-circulatingpN10396809yAvailable5N1039680900953cam 22002531 4500001000800000005001700008008004100025035002100066906004500087010001700132035001900149040001800168050002100186100003700207245009700244250002200341260006300363300005600426490004300482650005200525700005200577985002100629991004900650966895320031210181225.0830715s1922 msuab 000 0 eng 9(DLC) 25005333 a7bcbccoclcrplduencipf19gy-gencatlg a 25005333 a(OCoLC)9706316 aDLCcMsJdDLC00aPR5308b.A1 19221 aScott, Walter,cSir,d1771-1832.14aThe Lady of the Lake,cby Sir Walter Scott, Bart.; edited with notes by William J. Rolfe ... aRev. and enl. ed. aBoston,aNew York [etc.]bHoughton Mifflin companyc[1922] axvi, 272, [2] p. incl. front., illus., map.c17 cm.0 a[Riverside literature series,vno. 53] 0aLady of the Lake (Legendary character)vPoetry.1 aRolfe, W. J.q(William James),d1827-1910,eed. eOCLC REPLACEMENT bc-GenCollhPR5308i.A1 1922tCopy 1wOCLCREP00614nam 2200157I 45000010008000000080041000080100013000490350015000620500018000771000039000952450149001342600068002833000053003516000032004046100020004361828178830316s1912 nyucf 00010beng a13000924 a0313-237600 aBX9743.B7bR31 aRailton, George Scott,d1849-1913.04aThe authoritative life of General William Booth,bfounder of the Salvation army,cby G. S. Railton ... with a preface by General Bramwell Booth. aNew York,bHodder & Stoughton, George H. Doran companyc[c1912] a7 p. l., 331 p.bfront., ports., facsim.c20 cm.10aBooth, William,d1829-1912.20aSalvation Army.00595nam 2200181u 4500001000800000005001700008008004100025035002100066906004500087010001700132040001900149050001600168100006000184245002100244260004800265300004600313991005400359585949400000000000000.0810904s1897 mauf j 000 0 eng 9(DLC) 04016828 a0bcbccpremunvduencipf19gy-gencatlg a 04016828 aDLCcCarPdDLC00aPZ7.H784bC1 aHornibrook, Isabel Katherine,d1859- [from old catalog]10aCamp and trail; aBoston,bLothrop publishing companyc[1897] a2 p.bl., 5-305 p. front. plates.c20 cm. bc-GenCollhPZ7.H784iCp00024749368tCopy 1wPREM00478nam 2200145Ia 45000010009000000050017000090080041000260400023000670900022000901000021001122450102001332600042002353000029002774900026003061000467619880111095007.0831012s1918 nyua j 00011 eng d aNGUcNGUdm/cdBGU aPS3515.O585bO84610aHope, Laura Lee.14aThe Outdoor Girls in army service, or, Doing their bit for the soldier boys /cby Laura Lee Hope.0 aNew York :bGrosset & Dunlap,cc1918. a212 p. :bill. ;c20 cm.0 aOutdoor girls series.00497nam 2200157Ii 4500001000800000005001700008008004100025040001800066090002100084100002700105245007900132260004300211300002900254490003000283830002600313281042319880329145958.0770317s1914 xx j 00011 eng d aMNLcMNLdBGU aPS3511.L78bG75810aFlower, Jessie Graham.10aGrace Harlowe's second year at Overton College /cby Jessie Graham Flower.0 aPhiladelphia :bHenry Altemus,cc1914. a248 p. :bill. ;c19 cm.1 aThe college girls series. 0aCollege girls series.00765nam 22002291 4500001000800000005001700008008004100025035002100066906004500087010001700132035002000149040001900169050002100188100003400209245005100243260003700294300001900331500007100350700004400421985002100465991004900486960383219980421190136.0850703s1846 nyu 000 1 fre 9(DLC) 03029683 a7bcbccoclcrplduencipf19gy-gencatlg a 03029683 a(OCoLC)12231807 aDLCcNBuUdDLC00aPQ2228b.A1 18461 aDumas, Alexandre,d1802-1870.14aLes trois mousquetaires,cpar Alexandre Dumas. aNew York,bP. Gaillardet,c1846. a268 p.c26 cm. aAt head of title: Semaine littâeraire du Courrier des âEtats-Unis.2 aLe Courrier des âEtats-Unis,cNew York. eOCLC REPLACEMENT bc-GenCollhPQ2228i.A1 1846tCopy 1wOCLCREP00568nam 2200169I 4500001000800000005001700008008004100025010001300066040002400079050001400103100002000117245006200137260004200199300006900241500005000310600003800360492931019880421065446.0790504s1912 enkcfh b 00110 eng a13012350 aDLCcAMHdm.c.dm/c0 aPR4156.T510aThomas, Edward.10aGeorge Borrow,bthe man and his books,cby Edward Thomas.0 aLondon,bChapman & Hall, ltd.,c1912. axi, 333, viii p., 1 ¾.bfront., plates, ports., facsims.c23 cm. a"Bibliography of George Borrow": p.[323]-333.10aBorrow, George Henry,d1803-1881. The above are for these entries: 1. Celsissimus (German) http://www.gutenberg.org/etext/13953 gbn0403071608: Arthur Achleitner, Celsissimus (german). user@host. 1902p. 3/21/2004. ok. (that's a cleared2.gbn clearance line) 2. The Pocket George Borrow http://www.gutenberg.org/etext/13957 OK 20041030020123thomas The Pocket George Borrow Edward Thomas 1912:c 3. Les trois mousquetaires (French) http://www.gutenberg.org/etext/13951 OK 20041019125907dumas Les trois mousquetaires Alexandre Dumas 1844:p 4. Grace Harlowe's Second Year at Overton College http://www.gutenberg.org/etext/6858 gbn520: Grace Harlowe's Second Year at Overton College, Jessie Graham Flower user@host. 1914c. 9/13/2002. ok. (that's a cleared.gbn , really old clearance line) 5. The Outdoor Girls in Army Service, Or, doing their bit for the soldier boys http://www.gutenberg.org/etext/7494 gbn560g: The Outdoor Girls in Army Service, Laura Lee Hope. user@host. 1918c. 9/10/2002. ok. gbn568: Laura Lee Hope, The Outdoor Girls in Army Service. user@host. 1918c. 9/13/2002. ok. (cleared twice, but looks like the same edition) 6. Camp and Trail, A Story of the Maine Woods http://www.gutenberg.org/etext/13946 OK 20040825223614hornibrook Camp and Trail Isabel Hornibrook 1897:c 7. The Authoritative Life of General William Booth http://www.gutenberg.org/etext/13958 gbn0403190519: G[eorge] S[cott] Railton, The Authoritative Life of General William Booth. user@host. 1912c. 3/23/2004. ok. 8. The Lady of the Lake http://www.gutenberg.org/etext/3011 The Lady of the Lake Walter Scott J. C. Byers 11/23/99 ok 82-83c (this cleared line is from Michael's Xeroxes) 9. Lippincott's Magazine of Popular Literature and Science, Vol. XVII. No. 101. May, 1876. http://www.gutenberg.org/etext/13956 gbn0405261819: various, Lippincott's Magazine v. 17 Jan-June 1875. user@host. 1876c. 5/26/2004. ok. OK 20040808141522various Lippincott's Magazine v. 17 Jan-Jun 1876 various 1876:c (two clearances for this, too. We often clear entire year-long or multi-year volumes for periodicals based on a single TP&V scan) 10. Punch, or the London Charivari, Vol. 152, June 27, 1917 1917 Almanack http://www.gutenberg.org/etext/13954 gbn0402060846: Various, Punch - Vol. 152.. user@host. 1917p. 2/6/2004. ok. And, just for fun: Title Author Date Publisher Note Editor Call Number Corporate Author Description Edition Illustrator ISBN ISSN Language LC Call Number Main Series Subject Heading Gone with the wind Mitchell, Margaret; Herman Finkelstein Collection (Library of Congress); Alfred Whital Stern Collection of Lincolniana (Library of Congress) 1936 Macmillan "Published May 1936"--Verso of t.p. Actual publication of the 1st ed. was delayed to June 30, 1936. Cf. Gone with the wind as book and film / Richard Harwell. c1983. P. [xv].; LC copy has dust jacket. Newspaper clipping from the Parade section, Oct. 31, 1976 and magazine clipping from Publishers weekly, Sept. 6, 1976 on author laid in.; Source: Gift of Herman Finkelstein, Dec. 30, 1980. (none) PS3525.I972 (none) 1037 p. 22 cm. (none) (none) (none) (none) eng (none) (none) Women

Greg Newby wrote:
I don't want to over-specify how I think the workflow should happen. I think that's still to be determined. But the overall flow needs to be somewhat circular: librarians need to import existing PG catalog records, preferably in MARC format, to existing software. (Alev has a couple of programs for this; PGLAF can probably acquire software for other folks who'd like to work a lot on this activity.) Then, updated records would need to be shipped back into the catalog.
I think an easier solution would be to build an ASCII list containing the etext-number and the LoC Call Number for all etexts we have. We would then import the LoC Call Number into a field in the database. The catalog software could then update a number of fields (Subject, LoC Class, Unified Title) automatically from the LoC database (TODO Check copyright status of LoC database !!!) Then we could do a manual pass over the database with the MARC record at hand and fix the author / coauthor attributions, link into wikipedia if an article exists, add summaries etc. -- Marcello Perathoner webmaster@gutenberg.org

On Wed, Nov 10, 2004 at 09:01:07PM +0100, Marcello Perathoner wrote:
Greg Newby wrote:
I don't want to over-specify how I think the workflow should happen. I think that's still to be determined. But the overall flow needs to be somewhat circular: librarians need to import existing PG catalog records, preferably in MARC format, to existing software. (Alev has a couple of programs for this; PGLAF can probably acquire software for other folks who'd like to work a lot on this activity.) Then, updated records would need to be shipped back into the catalog.
I think an easier solution would be to build an ASCII list containing the etext-number and the LoC Call Number for all etexts we have.
We would then import the LoC Call Number into a field in the database.
The catalog software could then update a number of fields (Subject, LoC Class, Unified Title) automatically from the LoC database (TODO Check copyright status of LoC database !!!)
Then we could do a manual pass over the database with the MARC record at hand and fix the author / coauthor attributions, link into wikipedia if an article exists, add summaries etc.
I like this idea, but am concerned that there will still need to be human oversight. Just importing records will only work if there are unambiguous matches, and it seems that matching is often ambiguous.
From doing lots of copyright clearances, I know that many items are not in the LoC database (most of our non-English is not in there). But this would be a good start, and there are other national library catalogs that offer Z39.50 access to their records. -- Greg

Greg Newby wrote:
The catalog software could then update a number of fields (Subject, LoC Class, Unified Title) automatically from the LoC database (TODO Check copyright status of LoC database !!!)
I like this idea, but am concerned that there will still need to be human oversight. Just importing records will only work if there are unambiguous matches, and it seems that matching is often ambiguous.
We can start to match the easy ones and leave the hard ones to our librarians. We can periodically output a list of still unmatched books. The fields I propose to import (Subject, LoC, Unified Title) should not be ambiguous. It doesn't matter which edition of a work we match. OTOH for new books still in the DP queue it might be wiser to match the exact edition down to the format and number of pages and the coffee stain on page 42. -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Greg Newby
-
Marcello Perathoner