Re: PG catalog - MARC -- problem with encoding for Audio Books

Brad Collins wrote:
Steve/
I tried to send this to the list but it bounced but you will likely be about the only person interested so here it is...
Surely that's not true! I'm sure many on this list are just thrilled by discussions about MARC. ;-)
Steve Thomas <stephen.thomas@adelaide.edu.au> writes:
Feedback welcomed.
First of all, your script taught me a lot about both MARC and Perl. VERY GOOD WORK!
Thanks.
I have been working the last few days on porting your script to elisp and I noticed the following problem with Audio Book records:
RDF
<pgterms:etext rdf:ID="etext9737"> <dc:publisher>&pg;</dc:publisher> <dc:title rdf:parseType="Literal">The Seven Poor Travellers</dc:title> <dc:creator>Dickens, Charles (1812-1870)</dc:creator> <dc:language>en</dc:language> <dc:type>Audio Book, computer-generated</dc:type> <dc:created>2006-01-01</dc:created> <dc:rights>Copyrighted work. See license inside work.</dc:rights> </pgterms:etext>
MARC
LDR 00560cam 22001573a 4500 005 20041111153800.0 008 060101s2006||||xxu|||||s|||||000 f eng d 100 1 |aDickens, Charles,|d1812-1870 245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby Charles Dickens 260 |bProject Gutenberg Literary Archive Foundation,|c2006 500 |aProject Gutenberg 506 |aFreely available. 516 |aElectronic text 830 0|aProject Gutenberg|v9737 856 40|uhttp://www.gutenberg.org/etext/9737 856 42|uhttp://www.gutenberg.org/license|zLicense
First, I see you are using a prior version of the script/output. The latest version now produces this: LDR 00626cam a22002053a 4500 000 9737 003 PGUSA 005 20041115162032.0 008 060101s2006||||xxu|||||s|||||000 | eng d 040 |aPGUSA|beng 042 |adc 100 1 |aDickens, Charles,|d1812-1870 245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby Charles Dickens 260 |bProject Gutenberg,|c2006 500 |aProject Gutenberg 506 |aFreely available. 516 |acomputer-generated Audio Book 830 0|aProject Gutenberg|v9737 856 40|uhttp://www.gutenberg.org/etext/9737 856 42|uhttp://www.gutenberg.org/license|3Rights I can see immediately that I need to add soemthing for the copyrighted works. Probably an addition to the 506 note.
I am far from being fluent in MARC, but from what I've seen I would tend to say that the value for the 245 h subfield should be `sound recording' and I am still not sure about the 516 field for electronic file types.
You'll see that the 516 now reflects what's in the PG catalog. The 245 h subfield value used is a generic term, for the medium of the item, and this is commonly used for any kind of electronic resource. The term 'sound recording' is used for things like LP (and I guess CD) records. The major intent is to distinguish this item from other media, e.g. paper.
Does MARC have a list of defined enumerated values for these subfields?
I have a few other questions:
I'm also still not clear on why a 500 field is needed. The 500 field is a general note, so why would a note with a value of `Project Gutenberg' be helpful?
Not sure about this one. But using the 830 field (Series statement) requires either a 490 or 500 (general note). So I'm just following the MARC spec. here. Some things I just don't ask about. ;-)
Second, I would suggest making the 260 field a bit more ISBD-ish.
260 |a-Urbana: |bProject Gutenberg, |c2006.
or at least:
260 |aUrbana, |bProject Gutenberg, |c2006.
Yes. You'll see I'm now using just 'Project Gutenberg' for the publisher name -- after coment from Greg. The a subfield can be used for place of publication, but ... I'm not sure what that is. Is it still Urbana (I thought PG had long since moved from there)? Is it the business address of PGLAF? Is it the home town of ibiblio? In the end, it seemed easiest to omit that.
Which leads us to the question of how should the publisher name be formated?
In a sense, each PG; Aussie, Germany, EU, Canada etc would have a different city they were published in, located in the country they were from. Even if, (as they are in this case) seperate legal entities the city should be enough to identify PG USA. There should be a publisher authority record this points to. How should this be handled?
The catalog only includes items from "PGUSA". If other countries wanted to use the script to build MARC for their collections, then we can easily modify the script to change the publisher name. Thanks for the feedback! Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

On Thu, Nov 18, 2004 at 01:45:19PM +1030, Steve Thomas wrote:
...
Second, I would suggest making the 260 field a bit more ISBD-ish.
260 |a-Urbana: |bProject Gutenberg, |c2006.
or at least:
260 |aUrbana, |bProject Gutenberg, |c2006.
Yes. You'll see I'm now using just 'Project Gutenberg' for the publisher name -- after coment from Greg. The a subfield can be used for place of publication, but ... I'm not sure what that is. Is it still Urbana (I thought PG had long since moved from there)? Is it the business address of PGLAF? Is it the home town of ibiblio? In the end, it seemed easiest to omit that.
I always used Urbana because it's the historical home, and of course PG still has a presence there (i.e., Michael). Legally speaking, the PGLAF organizational home is wherever I live (funny, I know), unless the PGLAF board decides otherwise. But I don't like using this as a publication location, since I might move. Chapel Hill would be reasonable, since that's where iBiblio is, but PG has no "real" organization there. Salt Lake City is where the business office is, but overall I still prefer Urbana as the "publication location" for PG. There is no 100% accurate place to list. -- Greg

Greg Newby <gbnewby@pglaf.org> writes:
Yes. You'll see I'm now using just 'Project Gutenberg' for the publisher name -- after coment from Greg. The a subfield can be used for place of publication, but ... I'm not sure what that is. Is it still Urbana (I thought PG had long since moved from there)? Is it the business address of PGLAF? Is it the home town of ibiblio? In the end, it seemed easiest to omit that.
I always used Urbana because it's the historical home, and of course PG still has a presence there (i.e., Michael).
[snip]
There is no 100% accurate place to list.
Since the place of publication is important for determining copyright restrictions in some cases, I think it would be better to include a place of publication. This has bothered me for some time. I've always wondered how to handle virtual organizations which don't really have a place of publication in the conventional sense like PG or the Apache Group. So I did a little digging in the ISBD specs and found the following: ,----[ ISBD(ER) 4.1.13 ] | 4.1.13 When a place of publication, production or distribution does | not appear anywhere in the item, the name of the known city or town | is supplied in square brackets. If the city or town is uncertain, or | unknown, the name of the probable city or town followed by a | question mark is supplied in square brackets. e.g. | | - [Paris] | - [Prague?] `---- ,----[ ISBD(ER) 4.1.14 ] | 4.1.14 When the name of a city or town cannot be given, the name of | the state, province or country is given, according to the same | stipulations as are applicable to the names of cities or towns. | e.g. | | - Canada | Editorial comment: Known as place of publication; | appears in prescribed source. `---- Since PG doesn't explicitly state that the place of publication is in the States in etexts, (is that right?) this would suggest something like: - [USA]: Project Gutenberg, 2004. or (I prefer) - [Urbana]: Project Gutenberg, 2004. in BMF this might look like: published : ‐ $pl[[USA]]: $pb[Project Gutenberg], $dt[2004] or more verbose BMF (bxids only for example): published : ‐ $pl[$d:bxid://geo:IKE8-5510 $l:[USA]]: $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], $dt[$v:2004-10-12 $l:2004] BMF subfields used: (For complete list of subfields see: http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html) pl place name d defined-by l label pb publisher name dt inclusive dates v value-- in dt it should be a iso8601 formated date b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Brad Collins wrote:
Since the place of publication is important for determining copyright restrictions in some cases, I think it would be better to include a place of publication.
This has bothered me for some time. I've always wondered how to handle virtual organizations which don't really have a place of publication in the conventional sense like PG or the Apache Group.
I think the ISBD recommends using "s.l." where the place is unknown or indeterminate. (Initials for "sine loco". See http://www.ifla.org/VII/s13/pubs/isbd3.htm#18 section 4.1.15) However, this does not help the copyright question. MARC does provide the 506 field ("Restrictions on Access note") for copyright notices etc. Right now, I'm just putting "Freely available" in here (or the copyright statement for copyrighted works). But we could use a more detailed statement here. E.g. we should as a minimum say "Freely available in the USA. May be subject to copyright in other locations." We could also place the license url in this field (subfield u) rather than the 856. Regarding the series statement -- I'm not wedded to the use of 830 for "Project Gutenberg". It just seemed an appropriate way to include the PG number. One typical use of 830 in library catalogues is to be able to index works by series name. So this would allow (in this case) for a search on series name "Project Gutenberg" to list all the works in the collection. However, with currently almost 14,000 titles, maybe this isn't a worthwhile goal. "Project Gutenberg" should also be available as keywords in any library catalog search, if one needed to limit a search to just PG works. We could always expand the 500 General Note to include more detail about PG, including the item number. (500 can be whatever we want, and you can have as many 500 notes as you need.) Also, the item number is present in field 001 -- although that probably won't be visible to the general user of a library catalog, so including it in the 500 note is useful (and again makes the number usable in a keyword search). So if you want to reserve the 830 for particular series within PG (e.g. EET) then that's fine with me. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

On Fri, Nov 19, 2004 at 08:58:55AM +0700, Brad Collins wrote:
Greg Newby <gbnewby@pglaf.org> writes:
Yes. You'll see I'm now using just 'Project Gutenberg' for the publisher name -- after coment from Greg. The a subfield can be used for place of publication, but ... I'm not sure what that is. Is it still Urbana (I thought PG had long since moved from there)? Is it the business address of PGLAF? Is it the home town of ibiblio? In the end, it seemed easiest to omit that.
I always used Urbana because it's the historical home, and of course PG still has a presence there (i.e., Michael).
[snip]
There is no 100% accurate place to list.
Since the place of publication is important for determining copyright restrictions in some cases, I think it would be better to include a place of publication.
I definitely agree. I left the below for context, but wanted to mention my favorite is: [Urbana, Illinois]: Project Gutenberg, 2004. Note, I added the state, since there are many Urbanas. Urbana is as accurate as we are likely to get. -- Greg
This has bothered me for some time. I've always wondered how to handle virtual organizations which don't really have a place of publication in the conventional sense like PG or the Apache Group.
So I did a little digging in the ISBD specs and found the following:
,----[ ISBD(ER) 4.1.13 ] | 4.1.13 When a place of publication, production or distribution does | not appear anywhere in the item, the name of the known city or town | is supplied in square brackets. If the city or town is uncertain, or | unknown, the name of the probable city or town followed by a | question mark is supplied in square brackets. e.g. | | - [Paris] | - [Prague?] `----
,----[ ISBD(ER) 4.1.14 ] | 4.1.14 When the name of a city or town cannot be given, the name of | the state, province or country is given, according to the same | stipulations as are applicable to the names of cities or towns. | e.g. | | - Canada | Editorial comment: Known as place of publication; | appears in prescribed source. `----
Since PG doesn't explicitly state that the place of publication is in the States in etexts, (is that right?) this would suggest something like:
- [USA]: Project Gutenberg, 2004.
or (I prefer)
- [Urbana]: Project Gutenberg, 2004.
in BMF this might look like:
published : ‐ $pl[[USA]]: $pb[Project Gutenberg], $dt[2004]
or more verbose BMF (bxids only for example):
published : ‐ $pl[$d:bxid://geo:IKE8-5510 $l:[USA]]: $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], $dt[$v:2004-10-12 $l:2004]
BMF subfields used: (For complete list of subfields see: http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html)
pl place name d defined-by l label pb publisher name dt inclusive dates v value-- in dt it should be a iso8601 formated date
b/
-- Brad Collins <brad@chenla.org>, Bangkok, Thailand _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

My last post in reply to Greg included a chunk of rather raw notes I took on the subject yesterday. I might as well send along the rest of the notes which are all exploring issues with the nitty gritty details of manifestation entity records for PG texts. You can ignore the BMF stuff. Take this all as food for thought rather than specific suggestions for PG. ** Series ,----[ ISBD(ER) 6.6.1 ] | 6.6.1 The numbering of the item within a series or sub-series is | given in the terms in which it appears in the item. Standard | abbreviations may be used. Arabic numerals are substituted for other | numerals or spelled-out numbers. e.g. | | - (Multimedia learning series ; vol. 2) | - (Visit Canada series ; vol. C) | - (Computer simulation games ; module 5) | - (BTS research report ; 2) `---- Steve's script give's us: 830 0|aProject Gutenberg|v9737 But the ISBD suggests something like this: (Project Gutenberg etext ; no. 8654) 830 0|a(Project Gutenberg etext ; |vno. 9737 or BMF: series : ($a[Project Gutenberg etext] ; no. $vol[9737]) ** Material Designation ,----[ ISBD(ER) Appendix C ] | **General material designation:** | Electronic resource | | **Resource designations with "electronic" in the designations:** | Electronic data | Electronic font data | Electronic image data | Electronic numeric data | Electronic census data | Electronic survey data | Electronic representational data | Electronic map data | Electronic sound data | Electronic text data | Electronic bibliographic database(s) | Electronic document(s) (e.g. letters, articles) Electronic journal(s) | Electronic newsletter(s) `---- For PG, this would then suggest changing the more general Electronic Resource to more specific: Electronic document Electronic sound data The reason I am suggesting this is that all of the examples I have seen using `Electronic resourece' are for things like interactive CDROMS, and dynamic Web sites. These are not specifically electronic texts, documents or sound recordings. The distinction is small and certainly the general `Electronic Resource' works, but I wanted to find out if there were more specific enumerated values for material designation.... ** Mode of Access Since we haven't gotten around to working on Instance/Item entities yet, this is a bit premature. Access fields are not used in Manifestation entities. But reading through the ISBD and MARC specs got me thinking about the issue. I must say that I don't like the ISBD(ER) mode of access field. Mode of access: Internet via World Wide Web. URL: http://muse.jhu.edu/journals/callaloo/. This is needlessly verbose and redundent. Another example in the spec is a bit better. Mode of access: Internet. URL: http://mitpress.mit.edu/CityofBits/. But it's not much better. ,----[ ISBD(ER) 7.5.2 Notes relating to mode of access] | | Mode of access shall be recorded in a note for all remote access | electronic resources. | | Mode of access is given as the second note following the System | requirements note (see 7.5.1), if given, and is preceded by "Mode of | access" (or its equivalent in another language and/or script). In | the absence of a system requirements note, mode of access is given | as the first note. e.g. | | - Mode of access: Lexis system. Requires subscription to | Mead Data Central, Inc. | - Mode of access: World Wide Web. URL: http://www.un.org | - Mode of access: Internet via ftp://ftp.nevada.edu | - Mode of access: Gopher://gopher.peabody.yale.edu | - Mode of access: Computer university network | - Mode of access: Mikenet `---- On the whole, MARC and ISBD are a bit clumsy when it comes to networked resources--the records are basically electronic catalog cards. Numbers 2,3 and 4 are all network addresses, which have a URL pointing to the resource. I can understand putting a label indicating the type of network protocol but the examples are all screwed up mixing descriptive labels for the protocol with the type of network. Better would be something like the following: - access: Lexis [dialup network]: Note: Requires subscription to Mead Data Central, Inc. - access: Project Gutenberg (WWW site): URL: http://projectgutenberg.org - access: Project Gutenberg (FTP mirror): URL: ftp://ftp.ibiblio.org - access: Internet (FTP site): URL: ftp://ftp.nevada.edu - access: Internet (Gopher site): URL: gopher://gopher.peabody.yale.edu - access: UCLA (university intranet): URL: http://libary.ucla.edu:2080 Note: Requires university network account. - access: Mikenet (private local area network) in BMF access: - $a[$typ:dialup $l:Lexis (dialup network)]: Note: $not[Requires subscription to Mead Data Central, Inc.] - $a[$typ:www $l:Project Gutenberg (WWW site): URL: $url[http://projectgutenberg.org] - $a[$typ:ftp $l:Project Gutenberg (FTP mirror)]: URL: $url[ftp://ftp.ibiblio.org] - $a[$typ:ftp $l:Internet (FTP site)]: URL: $url[ftp://ftp.nevada.edu] - $a[$typ:gopher $l:Internet (Gopher site)]: URL: $url[gopher://gopher.peabody.yale.edu] - $a[$typ:intranet $l:UCLA (university intranet)]: URL: $url[http://libary.ucla.edu:2080] Note: Requires university network account. - $a[$typ:lan $l:Mikenet] ($not[private local area network]) Now what about MARC? Steve's script produces: 856 40|uhttp://www.gutenberg.org/etext/9737 856 42|uhttp://www.gutenberg.org/license|3Rights and the spec sez... ,----[ MARC: 856 Electronic Location and Access ] | Field 856 contains the information needed to locate and access an | electronic resource. The field may be used in a bibliographic record | for a resource when that resource or a subset of it is available | electronically. In addition, it may be used to locate and access an | electronic version of a non-electronic resource described in the | bibliographic record or a related electronic resource. `---- This breaks down to: *** Indicators First: 4 HTTP Second: 0 Resource 2 Related Resource *** Subfields $u URI (do they make a distinction between URI and URL?) $3 Materials specified. -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

I question if the use of a series number as suggested below is an ideal approach. I believe it is intended for a smaller number of items which are intentionally published as a series. I'd suggest that the closest thing to PG etexts numbers in a traditional research library, would be accession numbers (as commonly used for microforms) Andrew On Fri, 19 Nov 2004, Brad Collins wrote:
** Series
,----[ ISBD(ER) 6.6.1 ] | 6.6.1 The numbering of the item within a series or sub-series is | given in the terms in which it appears in the item. Standard | abbreviations may be used. Arabic numerals are substituted for other | numerals or spelled-out numbers. e.g. | | - (Multimedia learning series ; vol. 2) | - (Visit Canada series ; vol. C) | - (Computer simulation games ; module 5) | - (BTS research report ; 2) `----
Steve's script give's us:
830 0|aProject Gutenberg|v9737
But the ISBD suggests something like this:
(Project Gutenberg etext ; no. 8654)
830 0|a(Project Gutenberg etext ; |vno. 9737
or BMF:
series : ($a[Project Gutenberg etext] ; no. $vol[9737])

Andrew Sly <sly@victoria.tc.ca> writes:
I question if the use of a series number as suggested below is an ideal approach.
I believe it is intended for a smaller number of items which are intentionally published as a series.
I'd suggest that the closest thing to PG etexts numbers in a traditional research library, would be accession numbers (as commonly used for microforms)
I used Series because The Early English Text Society publications are cataloged as a series and this was the closest thing I have found to the PG etext numbers. This is from the LOC: Series: Early English Text Society (Series). Original series ; 10, [etc.] 830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.] I understand that the PG etext numbers are not a concious planned series but I still think it works.... b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

This is far from my area of expertise, but I do know that we are putting books into PG that come from several types of what I think of as "series". One kind is a group of books by one author (eg The Bobbsey Twins Series) and the other kind is a group of books, each by different authors, that are intended to go together (eg the English Men of Letters biographies). I'd think that we would want to have a way to represent each of these in the PG catalog. JulietS Brad Collins wrote:
Andrew Sly <sly@victoria.tc.ca> writes:
I question if the use of a series number as suggested below is an ideal approach.
I believe it is intended for a smaller number of items which are intentionally published as a series.
I'd suggest that the closest thing to PG etexts numbers in a traditional research library, would be accession numbers (as commonly used for microforms)
I used Series because The Early English Text Society publications are cataloged as a series and this was the closest thing I have found to the PG etext numbers.
This is from the LOC:
Series: Early English Text Society (Series). Original series ; 10, [etc.]
830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.]
I understand that the PG etext numbers are not a concious planned series but I still think it works....
b/

Juliet Sutherland <vze3rknp@verizon.net> writes:
This is far from my area of expertise, but I do know that we are putting books into PG that come from several types of what I think of as "series". One kind is a group of books by one author (eg The Bobbsey Twins Series) and the other kind is a group of books, each by different authors, that are intended to go together (eg the English Men of Letters biographies). I'd think that we would want to have a way to represent each of these in the PG catalog.
And you are correct -- and this is why MARC has a number of different ways of dealing with the issue (and I am not the person to explain them) but as far as I can see they are not mutually exclusive. Fields can be repeated (MARC 830 is repeatable) and there is no reason why there aren't series within series. If there is a better way to do this? Was the LOC example I used wrong? b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Hi Brad. I looked for an example of an accession number for microfiche, as used in a marc record, and found the following example: (The accession number is 05000, found in fields 490 and 830, pretty much as you had suggested.) 000 00858nam 2200181 a 450 001 571327 008 810528c19801898enka b 00011 eng 0 020 __ |a 0665050003 (Positive copy) 035 __ |a (CaOOCIHM)81603284X 035 __ |9 ACN8054TS 040 __ |a CaOOCIHM |b eng 100 10 |a Allen, Grant, |d 1848-1899 245 13 |a An African millionaire |h [microform] : |b episodes in the life of the illustrious Colonel Clay / |c by Grant Allen. 260 0_ |a London : |b G. Richards, |c 1898. 300 __ |a 4 microfiches (183 fr.) : |b ill. 490 1_ |a CIHM/ICMH Microfiche series = CIHM/ICMH collection de microfiches ; |v no. 05000 533 __ |a Filmed from a copy of the original publication held by the Izaak Walton Killam Mmemorial Library, Dalhousie University. |b Ottawa : |c Canadian Institute for Historical Microreproductions, |d 1980. 830 _0 |a CIHM/ICMH Microfiche series ; |v no. 05000 Thanks, Andrew
participants (5)
-
Andrew Sly
-
Brad Collins
-
Greg Newby
-
Juliet Sutherland
-
Steve Thomas