
On 12/5/05, N Wolcott <nwolcott@dsdial.net> wrote:
Now that the dust has cleared, can us proles have the final info-- can one download scans from google print, what is the best way, are they holding back, what is the resoloution, can one search (other than for the dejavu image), etc etc. Are there P2P networks to share stuff cribbed from google. Talking about PD images of course.
Google does not use dejavu -- that's the Internet Archive. Google presents a fairly small jpeg image for each page of the book. There's no fixed resolution, but instead a fixed width. This means that the images generated from small books are relatively easy to OCR (and equate to around 100 dpi), while the images from books with large pages are hard for even humans to read. Those of you who can get access to Google Print should be able to download these 'web resolution' images from them just by right-clicking and saving. As far as I know there's no way to access the higher resolution images they must have made when they originally scanned the material; nor is there any way to access the OCRed text they use for searching purposes. Google provided no mechanism to download all the images for a book. You'll have to roll your own download script, or use one of the scripts written by others, such as the perl script gharvest, available from http://www.zuhause.org/dp/gharvest Google also provides no index to the material they have scanned. Several people have generated one by the crude means of searching for many different phrases, and storing the results. The most extensive list is probably also Bruce's, available from http://www.zuhause.org/dp/gfound1.html I've used this as a basis for a page showing the DP harvesting status of the material: http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html -- Jon Ingram