
jim said:
they would lose that bet!
then jim said:
(Not that a lot can't be done to get rid of 90% of the errors "automagically!")
so you won't grant 100%, but you will grant 90%. well, google is probably betting they can get rid of 97% of the errors automatically. do you want to bet against google? because i'll take that bet against you. -bowerbird

do you want to bet against google?
because i'll take that bet against you.
Sure, I'd be happy to take that bet, if I am allowed to win it or lose it in a finite amount of time - such as a decade. What I think is much more likely in a decade is that Google is either gives up or they figure out how to post much more attractive page images. I actually don't think they have much of any interest in posting higher quality automatic OCR transcriptions.

Google's plan, from the outset, a year before we ever heard about it via the media, was to create the most "eBooks" for the cheapest cost and to generate the most media blitz public relations they could; it really had very little to do with creating high quality eBooks, tho, even I must admit, some came out better than I expected. When it comes to comparisons to PG/DP, Google is a paper tiger quite literally when it comes to quality, but when it comes to quantity it is PG/DP that is the dead tree big stripey cat. All in all, it won't hurt either way, and the ends will hit middles, with greater numbers of eBooks and greater quality. Don't forget The Internet Archive, etc. On Wed, 3 Mar 2010, James Adcock wrote:
do you want to bet against google?
because i'll take that bet against you.
Sure, I’d be happy to take that bet, if I am allowed to win it or lose it in a finite amount of time – such as a decade. What I think is much more likely in a decade is that Google is either gives up or they figure out how to post much more attractive page images. I actually don’t think they have much of any interest in posting higher quality automatic OCR transcriptions.

On 3/3/2010 6:03 PM, James Adcock wrote:
do you want to bet against google?
because i'll take that bet against you.
Sure, I'd be happy to take that bet, if I am allowed to win it or lose it in a finite amount of time -- such as a decade. What I think is much more likely in a decade is that Google is either gives up or they figure out how to post much more attractive page images. I actually don't think they have much of any interest in posting higher quality automatic OCR transcriptions.
Wrong again. Google is funding development of open source OCR software via project called ocropus. I believe a beta version is due out shortly. Further, Google bought ReCaptcha. That's the company and software that make you prove you are human on many websites. They provide two scanned words, one known and one not. The human types in both. This works well because what is hard for OCR software, eg a computer, is often easy for a human. Over millions of comparisons they are able to build up a pretty good version of the text. Since they don't address punctuation, and because capital and non-capital letters, and some blobs, can be hard to recognize out of context, they won't get the text perfect. But they can turn something from total gibberish into readable text. I believe that there will always be a place for humans in preparing etext versions of some books. But, just as OCR eventually became good enough to start with, eventually technology will improve enough humans will add value only on very difficult texts, or by contributing semantic information. I don't know when that will happen, but it is certainly coming. Juliet Sutherland
participants (4)
-
Bowerbird@aol.com
-
James Adcock
-
Juliet Sutherland
-
Michael S. Hart