
Bruce writes:
I personally find Michael Hart's counts of his World eBook Fair to be quite inflated, counting many Project Gutenberg books two or three
I've seen this also. There seems to be the main PG site, PGCC, and the former blackmask.com which all seem to have very similar content. I would expect at least three copies of each PG book which adds up to around 50,000 to 60,000 ebooks. That's an impressive number but very much inflated. That doesn't count other overlapping collections that I'm not aware of such as the Widger Library, etc.
On the plus side, Google now makes PDFs of most of the full view books with high resolution images and often has links to Open WorldCat. On the down side, they still seem to be skipping illustrations that don't have page numbers, and often lose a page or two around the illustrations. The PDFs are images only, so they're not really usable in low resolution devices like PDAs and cell phones.
Yes, and there you point out the major difference between Google and PG. While PG has about 20,000 books and PGCC has more than that, the Google books are pdf images and the PG books are plain text and html. I am blind and really have no way to use pdf images. I can print them to a virtual printer which in turn converts them to text with an OCR engine but this is a pain at the least and often locks up my computer. I recently had a major drive crash and had almost nothing left. Thanks to PG, I at least had reading material. Downloading, unzipping and reading would be impossible with Google. I can extract text from pdf files but only if they are saved as text. Also, PG books have a very low error rate while Google apparently skips and duplicates pages for no reason. For those reasons, I still think that PG is more impressive. It also comes down to how you define an ebook. If an ebook is just a book scanned and turned into page images, your figure is correct. Google has far more than PG. If an ebook is supposed to be useful to the masses and have the same or better accuracy than a printed book, it sounds like PG has more than Google. Anyone can scan a book and put it online but it's much harder to proofread and fix errors. I've read lots of bad scans in my time because that's all I could get. There is another site, Bookshare.org, primarily for the blind. They also offer full texts of books. They always have scanning errors. For legal reasons, they can't go through rounds of proofreading like DP. I'll take PG any day except that almost all of the PG books are prior to 1923 because of the copyright laws. I download books from Bookshare because I have no way to read them otherwise. On the other hand, I've seen some Google book excerpts with reasonably good text quality but it looked like I would have to read the text online which I didn't want to do. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.12.8/455 - Release Date: 9/22/06