
jimad said: "Google simply digitizing" Bowerbird said: you seem to think that google is just scanning the books, and displaying those scans to people. that's not the case. google is doing o.c.r., and is using the results of that o.c.r. Sorry, I know the broad strokes of what Google is doing. Rather I was pandering to my PG audience to soften the point I was trying to make [which is also somewhat the point you're trying to make] -- which is that --perhaps-- at some point in time in the near future using human beans to make txt files will no longer represent the best technological approach to making PD books available to the public -- and that with as examples the DX and Google "Page Image" PDFs maybe that day is getting pretty close. Google is still making the page image primary, and making the OCR -- however cleaned up or not -- secondary. IE google is using the OCR to make the book more-or-less searchable -- wonder why google would bother to do that? Some Google books OCR is very good, others OCR is very bad, and some Google books have only page images no OCR at all. Which begs the question, what IS the bottom-line goal of PG, and/or of DP? What IS IT we are really trying to accomplish here? Bowerbird said: one doesn't have to "imagine" a technology that will slice and re-dice a page-image to fit it onto a certain display-size...google is currently using its own variant of that. Sorry, where does google do a "slice and dice" -- can you provide a pointer? -- I know they do pan and scan. I also know some OCRs will do a mixed OCR text / word-image or char-image approach to digitizing a page based on how confident they are on a recognized word or not -- as in "paperless offices"