re: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks

michael said:
2 Months To 1/3 Million eBooks
last i remember reading -- six months back? -- the million book project said they had 600,000 books already scanned. but they conceded that not all were online yet... ok, here's the reference:
-bowerbird

My recollection of the latest word from Brewster is that they were just about to pass 10,000 full text eBooks, though not all had been proofread and edited to a 99.95% level of accuracy. So I am presuming they did actually pass 10,000 in the last month or so, though I haven't seen any official announcements. It would appear that one of the hardest things to find from Yahoo or Google eLibraries is the number of well finished eBooks. mh On Fri, 5 May 2006 Bowerbird@aol.com wrote:
michael said:
2 Months To 1/3 Million eBooks
last i remember reading -- six months back? -- the million book project said they had 600,000 books already scanned.
but they conceded that not all were online yet...
ok, here's the reference:
-bowerbird

Michael Hart wrote:
My recollection of the latest word from Brewster is that they were just about to pass 10,000 full text eBooks, though not all had been proofread and edited to a 99.95% level of accuracy.
So I am presuming they did actually pass 10,000 in the last month or so, though I haven't seen any official announcements.
Is Brewster's effort (I assume you mean OCA?) doing actual proofing (which I always interpret to mean human proofing)? I thought all they were doing is scanning books and producing raw, unproofed text by OCR. Jon

On Mon, May 08, 2006 at 02:40:20PM -0600, Jon Noring wrote:
Michael Hart wrote:
My recollection of the latest word from Brewster is that they were just about to pass 10,000 full text eBooks, though not all had been proofread and edited to a 99.95% level of accuracy.
So I am presuming they did actually pass 10,000 in the last month or so, though I haven't seen any official announcements.
Is Brewster's effort (I assume you mean OCA?) doing actual proofing (which I always interpret to mean human proofing)? I thought all they were doing is scanning books and producing raw, unproofed text by OCR.
Jon
You could probably call them and ask for details. When I was there in January, they were talking about doing some automated and semi-automated quality control (like looking for missing pages, and aligning pages that didn't scan straight). I don't think they're doing any human proofreading or markup at all -- instead, they are looking to Distributed Proofreaders to take that step (or anyone else interested). -- Greg

Michael Hart writes:
It would appear that one of the hardest things to find from Yahoo or Google eLibraries is the number of well finished eBooks.
My searches at Google have found about 50K books, with another 42k+ books that ought to be available but are not because Google appears to be not making books available if there's a publication date and no copyright date. I don't know how to find books from the OCA, or Yahoo.

On Mon, 8 May 2006, Bruce Albrecht wrote:
Michael Hart writes:
It would appear that one of the hardest things to find from Yahoo or Google eLibraries is the number of well finished eBooks.
My searches at Google have found about 50K books, with another 42k+ books that ought to be available but are not because Google appears to be not making books available if there's a publication date and no copyright date. I don't know how to find books from the OCA, or Yahoo.
Is this the kind of search we were discussing before, searching for commonplace words in the Google Book Search area, or have you found a better way? Perhaps you would be willing to post a list, or send it to me privately? Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg

Michael Hart writes:
On Mon, 8 May 2006, Bruce Albrecht wrote:
My searches at Google have found about 50K books, with another 42k+ books that ought to be available but are not because Google appears to be not making books available if there's a publication date and no copyright date. I don't know how to find books from the OCA, or Yahoo.
Is this the kind of search we were discussing before, searching for commonplace words in the Google Book Search area, or have you found a better way? Perhaps you would be willing to post a list, or send it to me privately?
They were found by doing keyword searches. Google Books now makes it easier to determine whether the book can be viewed in full. From the Google Book search page, they now indicate whether the book is either full view, snippet view, or no view. I'm not making a full list available anymore, at least not as a single download, because it took several minutes to download it from my site, and I was getting too many download requests from people who were downloading it because it was showing up at web search engines. I am currently working on loading MARC entries from several libraries for the books I've found, and will be supporting standard typical MARC tag searches (subject, author, publisher, language, etc.). My long term goal is to do the same for as many of the public domain image archives as I can. I still need to clean up the MARC entries, and the searches have not been implemented, but the website and displays of the Google Book entries and the MARC entries are at http://pdbooks.zuhause.org/

On Tue, 9 May 2006, Bruce Albrecht wrote:
Michael Hart writes:
On Mon, 8 May 2006, Bruce Albrecht wrote:
My searches at Google have found about 50K books, with another 42k+ books that ought to be available but are not because Google appears to be not making books available if there's a publication date and no copyright date. I don't know how to find books from the OCA, or Yahoo.
Is this the kind of search we were discussing before, searching for commonplace words in the Google Book Search area, or have you found a better way? Perhaps you would be willing to post a list, or send it to me privately?
They were found by doing keyword searches. Google Books now makes it easier to determine whether the book can be viewed in full. From the Google Book search page, they now indicate whether the book is either full view, snippet view, or no view.
I'm not making a full list available anymore, at least not as a single download, because it took several minutes to download it from my site, and I was getting too many download requests from people who were downloading it because it was showing up at web search engines.
Would you be willing to let pglaf.org handle those download problems?
I am currently working on loading MARC entries from several libraries for the books I've found, and will be supporting standard typical MARC tag searches (subject, author, publisher, language, etc.). My long term goal is to do the same for as many of the public domain image archives as I can.
Wonderful!!!
I still need to clean up the MARC entries, and the searches have not been implemented, but the website and displays of the Google Book entries and the MARC entries are at http://pdbooks.zuhause.org/
MARC listings for eBooks are obviously going to be one of the "next big things" for eLibraries! Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg

One more question, did you figure out any estimate of how many of those 50,000 books your search found could actually be downloaded? More thanks! Michael
participants (5)
-
Bowerbird@aol.com
-
Bruce Albrecht
-
Greg Newby
-
Jon Noring
-
Michael Hart