re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

michael said:
Obviously the press coverage about "Google library scanning" has done more "as the main impetus for resurg[ent] interest in a cyberlibrary" than the actualy scanning itself.
well, d'uh, of course. that's how it always is.
And the latest estimated I have received show that Google's total number of books has just recently passed 50,000
i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000. but even if it is just 50,000 total, they're still on my schedule: i predicted 10,000 after one year, 100,000 after two years, 1 million after three years, and 10 million after four years...
similar reports say that 88% are neither downloadable nor proofread to any particular level of accuracy.
except it's not google's job to make them downloadable, not in convenient form, nor to proofread the digitized text. it is _our_ job to grab the scans (as nicely and neatly as possible, courteous and respectful of the cost they entailed by scanning), and to make them available in a convenient format for reading, as well as to formulate automatic procedures to digitize the text and take it to a very high degree of accuracy. even if google did do these jobs for us, i would still replicate it, because i don't want to have to be dependent on google forever.
Somehow I don't think this was accidental. . . .
the point is, if your books were _already_ "reading each other", people would have been talking about it long before this article. -bowerbird p.s. i see you're one of those old-fashioned people who refuse to recognize "resurging" as an adjective. it's ok. hopefully, if i keep using it that way, i'll win. (i'm trying to change the usage of "hopefully" with the same strategy.) :+)

And the latest estimated I have received show that Google's total number of books has just recently passed 50,000
i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000.
Even that number is a misinterpretation. There's at the moment 92000 pre-1923 books available from Google Print. The 50.000 that google has made fully downloadable have a clear pre-1923 copyright statement; the 42.000 don't have a clearcut copyright statement and thus Google only gives the snippet option. I've never read numbers about the post-1923 books available, Bruce doesn't look for those in his various searches, as far as I am aware. Frank

Bowerbird@aol.com writes:
And the latest estimated I have received show that Google's total number of books has just recently passed 50,000
i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000.
My searching found 50,000 public domain titles available as complete books, and another 42,000 that should have been available as complete books because they were published prior to 1923, but were only visible in snippet view. I have no idea how many books Google scanned published after 1922 which are probably PD because the copyright was apparently not renewed, nor the number of books scanned even though the book is still under copyright.

On Mon, 22 May 2006, Bruce Albrecht wrote:
Bowerbird@aol.com writes:
And the latest estimated I have received show that Google's total number of books has just recently passed 50,000
i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000.
Then I was probably right to count Google's total as ~100,000 in my own public estimations, though I would prefer counts of downloadable books to avoid Google's new policy of: "Google Book Search is a means for helping users discover books, not to read them online and/or download them."
My searching found 50,000 public domain titles available as complete books, and another 42,000 that should have been available as complete books because they were published prior to 1923, but were only visible in snippet view. I have no idea how many books Google scanned published after 1922 which are probably PD because the copyright was apparently not renewed, nor the number of books scanned even though the book is still under copyright.
Are you saying that there are actually 50,000 downloadable full text Google eBooks? Any idea of their level of accuracy? Please allow me to renew the request from myself and LIS PhD Greg Newby, CEO of Project Gutenberg, for a copy of the list we can look over, even if we cannot make it public. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org

There is no doubt that the Open Book Project of Brewster Kahle has the most accurate books online. However they have only 1000 books. The million books project I would rate just barely above google books in quality and completeness. It also helps if the page is not turned before the scan is complete. The US is doing very well in providing a large number of useless images online. Does anyone know how to get a book into Open Book Project? do you have to be a library? nwolcott2@post.harvard.edu ----- Original Message ----- From: "Michael Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Tuesday, May 23, 2006 10:03 AM Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
On Mon, 22 May 2006, Bruce Albrecht wrote:
Bowerbird@aol.com writes:
And the latest estimated I have received show that Google's total number of books has just recently passed 50,000
i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000.
Then I was probably right to count Google's total as ~100,000 in my own public estimations, though I would prefer counts of downloadable books to avoid Google's new policy of:
"Google Book Search is a means for helping users discover books, not to read them online and/or download them."
My searching found 50,000 public domain titles available as complete books, and another 42,000 that should have been available as complete books because they were published prior to 1923, but were only visible in snippet view. I have no idea how many books Google scanned published after 1922 which are probably PD because the copyright was apparently not renewed, nor the number of books scanned even though the book is still under copyright.
Are you saying that there are actually 50,000 downloadable full text Google eBooks?
Any idea of their level of accuracy?
Please allow me to renew the request from myself and LIS PhD Greg Newby, CEO of Project Gutenberg, for a copy of the list we can look over, even if we cannot make it public.
Thanks!!!
Give the world eBooks in 2006!!!
Michael S. Hart Founder Project Gutenberg
Blog at http://hart.pglaf.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On 5/23/06, Norm Wolcott <nwolcott2ster@gmail.com> wrote:
There is no doubt that the Open Book Project of Brewster Kahle has the most accurate books online. However they have only 1000 books. The million books project I would rate just barely above google books in quality and completeness. It also helps if the page is not turned before the scan is complete. The US is doing very well in providing a large number of useless images online.
Does anyone know how to get a book into Open Book Project? do you have to be a library?
I wish I knew. I've scanned almost a thousand books for Distributed Proofreaders, and the Internet Archive would be a great place to permanently store the images. Every time I've asked them on their website, however, they either haven't replied, or have said that letting outside people contribute material is something that they're planning on setting up, but with no firm date. -- Jon Ingram

Norm Wolcott writes:
There is no doubt that the Open Book Project of Brewster Kahle has the most accurate books online. However they have only 1000 books. The million books project I would rate just barely above google books in quality and completeness. It also helps if the page is not turned before the scan is complete. The US is doing very well in providing a large number of useless images online.
What is the URL for this archive? When I searched Google, I found an "Open Book Project" at ibiblio, but it seemed to have almost nothing there, and it looked like it wasn't from scans anyway. If you're referring to the "Open Content Alliance", I'd love to see a URL for find its archives. Internet Archive doesn't seem to have a category for OCA texts yet.
participants (6)
-
Bowerbird@aol.com
-
Bruce Albrecht
-
Frank van Drogen
-
Jon Ingram
-
Michael Hart
-
Norm Wolcott