Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

23 May 2006

      My success with google pd books is about 30%. Some books are so dark they
are unreadable, let alone ocr, these seem to appear as jpegs. Others have
whole sides of pages clipped off throughout the entire book. When the images
are pretty good they seem to appear as png's. I have found most of the books
with the png extension are pretty good. All seemed to have the occasional
missing pages. I have sent many errors in to google and get a nice canned
reply, but no improvement in the output is visible nor further feedback. I
have found these books most useful when I already have a copy of the book,
and can use the google scan to help speed up the scanning/ocr process.

In fact I don't see how DP is coping with these google texts giving their
now stricter requirements that a perfect scan of every page and illustration
must be provided before the book can even get into their processing queue. A
missing part of a page or illegible word cannot be corrected from another
edition, due to their high standard of perfection. With the average book now
requiring 2 years to go through their four levels of proofreading, one does
wonder.

nwolcott2@post.harvard.edu
----- Original Message -----
From: "Frank van Drogen" <fvandrog@scripps.edu>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org>
Sent: Monday, May 22, 2006 5:19 PM
Subject: re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
...
...
it's clear that google has gotten their legs under them
in regard to doing the scanning.  let's hope that they'll
get their quality-control under control very soon too...
I have found less missing pages and other problems in books from Google
then in those from the MBP and Canadian/IA. They are, however, still far
from perfect. When they get a report regarding a missing or wrongly
...
page in a PD book; it is apparently up to the providing library to get the
problem sorted out. I've heard report of complete books being rescanned
(with the risk of having another page missing in the end ;) ). I've also
heard somebody mentioning that the full rescanned book was stuck behind
scanned
the
...
existing one (rather space consuming, but for DP purposes a lot saver.
What worries me in this is that Google doesn't seem to care whether pages
are missing or not... as long as they get 99% of the pages from a book
stored, changes are most search terms pointing to the particular book will
be identified. Their interest lies in people purchasing the book via
Amazon, Abe etc. after identifying them via book.google.com.
The best quality control I have encountered so far is on Gallica, where
appart from missing pages due to those pages missing in the original
scanned manuscript, I've not encountered incomplete books. I'd be actually
interesting to see how they perfrom their quality control.
Frank
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

Norm Wolcott