Re: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

2 Jun 2006

      Michael Hart wrote:
...
Google's monster speciality is SEARCH ENGINES!!!
They are MUCH more interested in writing a search engine that will
read fuzzy OCR text than in increasing the accuracy of the text.
You mean a search engine that finds "I)arwin" when I search for "Darwin"?

That search engine would have to automagically decide that "I)" looks
quite a bit the same as "D".

But that's the same thing an OCR software already does! to match
characters against ink stains. If they come up with some better
algorithm to do that, they would be foolish not to use it directly on
the scanned texts.

Somewhere they have to keep the OCRed text of their books. It would take
much less cycles to clean up the text (once) instead of having the
search engine do a fuzzy match every time a user does a search.

-- 
Marcello Perathoner
webmaster@gutenberg.org