
On Mon, 22 May 2006, Frank van Drogen wrote:
it's clear that google has gotten their legs under them in regard to doing the scanning. let's hope that they'll get their quality-control under control very soon too...
After about 25% of their 6 year schedule to 10 million books, it would appear they are approaching 1% or 100,000 total books, with perhaps half of those easily downloadable, but in varying states of completion and accuracy. If you presume they keep up with Moore's Law, 6 years looks like: Totals Dates Doublings Years 00 Dec 14, 2004 0 0 50,000 Jun 14, 2006 1 1.5 100,000 Dec 14, 2007 2 3 200,000 Jun 14, 2009 3 4.5 400,000 Dec 14, 2010 4 6 which continues as 800,000 Jun 14, 2012 5 7.5 1,600,000 Dec 14, 2013 6 9 3,200,000 Jun 14, 2015 7 10.5 6,400,000 Dec 14, 2016 8 12 12,800,000 Jun 14, 2018 9 13.5 which would put them at over 12 years to their 10 million books in terms of downloadable eBooks. However, if you presume they have 100,000 by June 14, 2006, this would take 18 months off their total time, by counting non-downloadable and non-readable books.
I have found less missing pages and other problems in books from Google then in those from the MBP and Canadian/IA. They are, however, still far from perfect. When they get a report regarding a missing or wrongly scanned page in a PD book; it is apparently up to the providing library to get the problem sorted out. I've heard report of complete books being rescanned (with the risk of having another page missing in the end ;) ). I've also heard somebody mentioning that the full rescanned book was stuck behind the existing one (rather space consuming, but for DP purposes a lot saver.
What worries me in this is that Google doesn't seem to care whether pages are missing or not... as long as they get 99% of the pages from a book stored, changes are most search terms pointing to the particular book will be identified. Their interest lies in people purchasing the book via Amazon, Abe etc. after identifying them via book.google.com.
When your goal is simply the appearance of having a lot of books, 99% is a perfectly good business plan. And if your goal is to get people to BUY the books from your other business partners, then there is even less reason for moving to 99+%.
The best quality control I have encountered so far is on Gallica, where appart from missing pages due to those pages missing in the original scanned manuscript, I've not encountered incomplete books. I'd be actually interesting to see how they perfrom their quality control.
If you can give me any contact info on Gallica, I will see if I can find out for you. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org