Re: [posted] Language detection error? (Re: Posted (#40466, Sifferath) !)
When a copyright clearance is submitted, the only allowed language entry on the clearance form is a language code, e.g. "en". When the finished ebook files are later uploaded, that "en" is translated into "English", which is what comes to the WWers. I suspect that if this translation can't be done, as apparently with "oj", either because the copyright submitter or uploader entered an incorrect or unknown code, the code is passed as-is to the WWers, unless the uploader intervenes with the full name of the language. Al
-----Original Message----- From: Blower Nigel [mailto:NBlower@Queenedith.cambs.sch.uk] Sent: Saturday, August 11, 2012 12:22 PM To: Al Haines; gbnewby@pglaf.org Cc: 'Project Gutenberg Postings Announcements'; 'Peter Podgor ek'; dp-post@pgdp.net; 'Andrew Sly'; 'Marcello Perathoner' Subject: RE: Language detection error? (Re: [posted] Posted (#40466, Sifferath) !)
Thanks for clarifying Al.
On the upload form, where you enter the language, it actually says "Main language (two-letter code or single-word language name):". The reason I put "oj" instead of "Ojibwa" is that I had seen it spelt Ojibwa and Ojibwe, and thought if I entered the ISO code there wouldn't be any confusion, and the code would match whichever spelling had been decided upon for cataloguing at PG.
It appears as though I inadvertantly caused confusion - sorry. What I could have done in retrospect is to also add the full language name alternatives in the note to WWer. I hadn't realised that the language part of it needed manual intervention by Al.
Perhaps if the 2 letter code is not good for the WWers, the wording on the upload form should be changed.
Thanks to all for sorting this out Nigel
________________________________________ From: Al Haines [ajhaines@shaw.ca] Sent: 11 August 2012 19:42 To: gbnewby@pglaf.org; Blower Nigel Cc: 'Project Gutenberg Postings Announcements'; 'Peter Podgor ek'; dp-post@pgdp.net; 'Andrew Sly'; 'Marcello Perathoner' Subject: RE: Language detection error? (Re: [posted] Posted (#40466, Sifferath) !)
A couple of things happened with this one's language.
Nigel says he entered "oj" on the upload form, which I changed to "Ojibwa". (I had to figure this out from the uploaded text file.) BUT, I forgot to save the change, which left "oj". I'm guessing (and *only* guessing) that the posting software saw "oj", didn't understand it, and put "English", which is what ended up in the catalog. If I had saved the change, "Ojibwa" would have been added to PG's list of languages, and the catalog page correct.
When Nigel informed me of this book's bibrec page showing English, I tried to correct 40466's language to "Ojibwa", but there's no mechanism (that I could see) in the catalog back-end software to manually add a new language. (You *can* add new authors.) When I did a wild-card search (*) to get the full list of available languages, I found "Ojibwa, Western", and used it.
Hint to DP uploaders: language names should always be written in full. It saves the WWers from having to figure out, or ask the uploader, what a code means.
Al
-----Original Message----- From: Greg Newby [mailto:gbnewby@pglaf.org] Sent: Saturday, August 11, 2012 11:22 AM To: Blower Nigel Cc: Project Gutenberg Postings Announcements; Peter Podgor ek; dp-post@pgdp.net; Andrew Sly; Marcello Perathoner; Al Haines Subject: Language detection error? (Re: [posted] Posted (#40466, Sifferath) !)
Thanks for this closer look, Nigel. In response, I also just took a closer look, and now wonder whether there was a glitch in the Web cataloging or human cataloging. I don't think it was Al that entered "Ojibwa, Western," but the automatic post-processing & cataloging that happens when new files are posted.
Within the text (HTML and .txt) you can see the language is Ojibwa, as you submitted: Language: Ojibwa
But the bibrec page lists "Ojibwa, Western:" http://www.gutenberg.org/ebooks/40466
It might be that Marcello's automatic cataloging somehow matched on a more specific language code (perhaps simply selecting the latest sorted code with a matching string).
Based on your input, and the fact that the books do indicate Ojibwa withIN them, I think we should recode ISO 639-3 code "oji", as you indicated below.
I'm cc'ing Andrew Sly, who (along with Marcello and I, and a few others) who can "make it so" in the bibrec. But we can see whether others have different opinions or diagnostics.
Even if it's "Ojibwa," rather than "Ojibwa, Western", it's a new language for Project Gutenberg. Thanks again, -- Greg
Hi all
I'm not sure it is *Western* Ojibwa.
The project at DP was labelled as Ojibwa, and after I PPVed it, when I uploaded to PG, I entered the 2 character language code "oj" which is the ISO 639-1 code for Ojibwa. The WWer, Al Haines, entered "Ojibwa, Western" in the Bibrec, which is ISO 639-3 code "ojw".
In my ignorance, I assumed that "Western Ojibwa" was the full name for Ojibwa. Since Greg's email, I've investigated a bit more, and there are several Ojibwa dialects. Since on the title page Sifferath is described as Missionary of the Ottawa and Otchipwe Indians, and this page (http://home.kpn.nl/cvkolmes/ojibwe/Siff/Sifferath.htm) describes the book as Sifferath's Odaawaa Catechism, perhaps the language would be better described as "Ottawa", which is ISO 639-3 code "otw", or maybe just Ojibwa, ISO 639-3 code "oji" which is an inclusive code, would be sufficient.
If you search for Ojibwa on the gutenberg site, some books do come up which are labelled North American Indian "nai".
Sorry if any of this confusion is my fault - do let me know if you need me to do anything about it.
Regards Nigel
________________________________________ From: Greg Newby [gbnewby@pglaf.org] Sent: 11 August 2012 15:52 To: Project Gutenberg Postings Announcements Cc: Blower Nigel; Peter Podgor ek; dp-post@pgdp.net Subject: Re: [posted] Posted (#40466, Sifferath) !
This is our first eBook in the language of Western Ojibwa! -- Greg
On Thu, Aug 09, 2012 at 01:57:51PM -0700, Al Haines wrote:
A Short Compendium of the Catechism for the Indians, by
N. L. Sifferath 40466
[Subtitle: With the Approbation of the Rt. Rev. Frederic Baraga, Bishop of Saut Sainte Marie] [Other: Frederic Baraga] [Language: Ojibwa] [Link: http://www.gutenberg.org/4/0/4/6/40466 ] [Files: 40466.txt; 40466-h.htm]
Thanks to Peter Podgor?ek, Heiko Evermann and the Online Distributed Proofreading Team at http://www.pgdp.net (This book was
On Sat, Aug 11, 2012 at 05:39:52PM +0100, Blower Nigel wrote: produced
from scanned images of public domain material from the Google Print project and from Canadiana.org)
Regards, Al
Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation www.gutenberg.org A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org
The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. If you receive this email by mistake please notify the sender and delete it immediately. Opinions expressed are those of the individual and do not necessarily represent the opinion of The Queens? Federation. All sent and received emails from The Queens? Federation are automatically scanned for the presence of computer viruses and security issues.
The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. If you receive this email by mistake please notify the sender and delete it immediately. Opinions expressed are those of the individual and do not necessarily represent the opinion of The Queens' Federation. All sent and received emails from The Queens' Federation are automatically scanned for the presence of computer viruses and security issues.
On 08/11/2012 10:21 PM, Al Haines wrote:
When a copyright clearance is submitted, the only allowed language entry on the clearance form is a language code, e.g. "en". When the finished ebook files are later uploaded, that "en" is translated into "English", which is what comes to the WWers.
I suspect that if this translation can't be done, as apparently with "oj", either because the copyright submitter or uploader entered an incorrect or unknown code, the code is passed as-is to the WWers, unless the uploader intervenes with the full name of the language.
According to Ethologue "Ojibwa" is a macrolanguage encompassing 7 languages. http://www.ethnologue.com/show_language.asp?code=oji I have added Ojibwa to the known languages and updated #40466 to Ojibwa. If somebody wants to provide a more specific language attribution, I can add that one too. It would be useful to get a heads-up before new obscure languages get posted. That way I can add the language to the catalog before the book gets posted and avoid the mess. -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Al Haines
-
Marcello Perathoner