
On Tue, 14 Dec 2004, Tony Kline wrote:
Bowerbird@aol.com wrote:
tony said:
That's very good, though image files hardly meet the needs of those users who want digital text and the ability to download, cut and paste etc
well, since google _is_ a search engine, they'll obviously o.c.r. the text. and clean up the text, because errors would muck up their search engine.
Did they say OCR or did you deduce that? I got the impression they are imaging pages, and maybe adding some identifying keywords for each page. That is you'll be able to Google to a title chapter and page maybe, but you won't be able to Google within pages. Try OCR'ing some of the stuff in the Bodleian...there ain't no such fonts!! Does anyone know what they mean by digitizing?
Here's what I have gleaned from 5 TV network news shows and the various NYT, SF Chron, etc., articles: There will be one "full text" respository at Google, but users won't be able to access more than a "snippet" around any quotation they look up, much as with general Google searches today, and then, if they want more, they will have to click on the item and will then arrive at a second database, this one provided by one of the five libraries [NYCPL, Harvard, Michigan, Stanford, Oxford] where they will get a graphical representation of the non-printable page that contains the quotation. Why they chose to call it "Google Print" when printing is outlawed, I have no idea. Michael