Re: [gutvol-d] Google print

5 Dec 2005

      On 12/5/05, N Wolcott <nwolcott@dsdial.net> wrote:
...
Now that the dust has cleared, can us proles have the final info-- can one
download scans from google print, what is the best way, are they holding
back, what is the resoloution, can one search (other than for the dejavu
image),  etc etc. Are there P2P networks to share stuff cribbed from google.
Talking about PD images of course.
Google does not use dejavu -- that's the Internet Archive. Google
presents a fairly small jpeg image for each page of the book. There's
no fixed resolution, but instead a fixed width. This means that the
images generated from small books are relatively easy to OCR (and
equate to around 100 dpi), while the images from books with large
pages are hard for even humans to read.  Those of you who can get
access to Google Print should be able to download these 'web
resolution' images from them just by right-clicking and saving. As far
as I know there's no way to access the higher resolution images they
must have made when they originally scanned the material; nor is there
any way to access the OCRed text they use for searching purposes.

Google provided no mechanism to download all the images for a book.
You'll have to roll your own download script, or use one of the
scripts written by others, such as the perl script gharvest, available
from
  http://www.zuhause.org/dp/gharvest
Google also provides no index to the material they have scanned.
Several people have generated one by the crude means of searching for
many different phrases, and storing the results. The most extensive
list is probably also Bruce's, available from
  http://www.zuhause.org/dp/gfound1.html
I've used this as a basis for a page showing the DP harvesting status
of the material:
  http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html

--
Jon Ingram