Update on Harvesting of the Internet Archive's Canadian Libraries collection

A week ago, DP started systematically working through the Internet Archive's Canadian Libraries collection. In that week, we have looked at 208 of the 798 books in the archive. Of these, 22 books are identical to books which have previously been through DP, so will not need to be looked at further, and 18 books have errors (missing or blurred pages) -- the remaining 168 are being processed and should eventually all move through DP. The aim is to eventually process *every* book in this collection, and then move on to others. You can monitor the current progress of our harvesting effort here: http://homepage.ntlworld.com/jenjonliz/jon/tia/canadianlibraries.html And (if you are a DP project manager) claim texts using the following thread in the DP forum: http://www.pgdp.net/phpBB2/viewtopic.php?t=14768 -- Jon Ingram __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Hi Jonathan, I have done some work developing scripts to re-process the page image sets from the Toronto archive. If you're interested, maybe we should compare notes. I've found the images to be quite high quality. You didn't mention reconciling your list with the cleared/books in progress list, and looking at your web page, I see that you intend to process at least one of those books in progress - by me - namely: Macdonald, (Captain) John A Troublous Times in Canada: A History of the Fenian Raids of 1866 and 1870 Clearance OK key=20041231142201macdonald I've always found it a let down when a book I've laboured over pops up in PG 3 days before my year-long effort is finally finished. I have dibs on this: hands off. Also, your list also has a duplicate entry for this title. On 13:25:22 Jonathan Ingram wrote:
A week ago, DP started systematically working through the Internet Archive's Canadian Libraries collection. In that week, we have looked at 208 of the 798 books in the archive. Of these, 22 books are identical to books which have previously been through DP, so will not need to be looked at further, and 18 books have errors (missing or blurred pages) -- the remaining 168 are being processed and should eventually all move through DP. The aim is to eventually process *every* book in this collection, and then move on to others.
You can monitor the current progress of our harvesting effort here: http://homepage.ntlworld.com/jenjonliz/jon/tia/canadianlibraries.html And (if you are a DP project manager) claim texts using the following thread in the DP forum: http://www.pgdp.net/phpBB2/viewtopic.php?t=14768

*Interspersed* On 4/24/05, Gardner Buchanan <gbuchana@rogers.com> wrote:
Hi Jonathan,
You didn't mention reconciling your list with the cleared/books in progress list, and looking at your web page, I see that you intend to process at least one of those books in progress - by me - namely:
Macdonald, (Captain) John A Troublous Times in Canada: A History of the Fenian Raids of 1866 and 1870
Apparently you didn't read the header at all; these are a list of _all_ of the books in the Canadian Library section of IA that are in their catalog. The PM making a claim is responsible for checking against the In-Progress list; Jon is simply providing a faster method of reducting duplicate claims than Mr. Price's invaluable monthly updates. Similar setups have happened around holidays, where a number of Christmas or Halloween books may be cleared within days of each other.
I've always found it a let down when a book I've laboured over pops up in PG 3 days before my year-long effort is finally finished. I have dibs on this: hands off.
Your tone is a bit harsh. If you bothered to read the header, you would have seen (by the lack of a color code) that those books are currently unclaimed by anyone. A simple note to the thread (I see that you have a DP account) or privately to Jon, would get them claimed in your name. Actually, while I am typing this you have done so, although it would have been better to claim both.
Also, your list also has a duplicate entry for this title.
That is because the IA has two copies of this title. One taken on their original setup, and another one taken later, with a different camera.
From a post by Molly at IA: We tagged all of the books with what kind of scanner scanned them. Here's the key: Kirtas APT 1200 #1- prototype robot with 8 megapixel camera, originals shot at around 250DPI, processed images interpolated to 300DPI (processed will be bigger, around 3MB each). Kirtas APT 1200 #2- production robot, same kind of camera as #1 Kirtas APT 1200 #2.5- production robot, new 16megapixel camera. originals are 500ish DPI (9-10MB each), most of the time processed images are 300DPI (~2MB each).
R C

I've always found it a let down when a book I've laboured over pops up in PG 3 days before my year-long effort is finally finished. I have dibs on this: hands off.
Despair not, but instead Rejoice! Having two independent transcriptions is a great way to catch errors, since two transcribers working from different copies of the book, by different methods, are unlikely to make the same error. Assuming there aren't major textual differences, diffing two independent transcriptions and checking the discrepancies is one of the best way to get our quality up to five or even six nines. -- RS
participants (4)
-
Gardner Buchanan
-
Jonathan Ingram
-
Robert Cicconetti
-
Robert Shimmin