[gutvol-d] Re: Kindle DX, etc.

27 Jun 2009

      jimad said:
"Google simply digitizing"

Bowerbird said:
you seem to think that google is just scanning the books, and displaying
those scans to people.  that's not the case.
google is doing o.c.r., and is using the results of that o.c.r.

Sorry, I know the broad strokes of what Google is doing.  Rather I was
pandering to my PG audience to soften the point I was trying to make [which
is also somewhat the point you're trying to make] -- which is that
--perhaps-- at some point in time in the near future using human beans to
make txt files will no longer represent the best technological approach to
making PD books available to the public -- and that with as examples the DX
and Google "Page Image" PDFs maybe that day is getting pretty close.  Google
is still making the page image primary, and making the OCR -- however
cleaned up or not -- secondary. IE google is using the OCR to make the book
more-or-less searchable -- wonder why google would bother to do that? Some
Google books OCR is very good, others OCR is very bad, and some Google books
have only page images no OCR at all.  

Which begs the question, what IS the bottom-line goal of PG, and/or of DP?
What IS IT we are really trying to accomplish here?

Bowerbird said:
one doesn't have to "imagine" a technology that will slice and re-dice a
page-image to fit it onto a certain display-size...google is currently using
its own variant of that.

Sorry, where does google do a "slice and dice" -- can you provide a pointer?
-- I know they do pan and scan.  I also know some OCRs will do a mixed OCR
text / word-image or char-image approach to digitizing a page based on how
confident they are on a recognized word or not -- as in "paperless offices"

[gutvol-d] Re: Kindle DX, etc.

Jim Adcock