Re: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

2 Jun 2006


      On Fri, 2 Jun 2006, Marcello Perathoner wrote:
...
Michael Hart wrote:
...
Google's monster speciality is SEARCH ENGINES!!!
They are MUCH more interested in writing a search engine that will
read fuzzy OCR text than in increasing the accuracy of the text.
You mean a search engine that finds "I)arwin" when I search for "Darwin"?
That search engine would have to automagically decide that "I)" looks
quite a bit the same as "D".
Someone posted a number of such examples they found a while back,
and it appeared as if that was the general idea.
...
But that's the same thing an OCR software already does! to match
characters against ink stains. If they come up with some better
algorithm to do that, they would be foolish not to use it directly on
the scanned texts.
I think they will probably wait several iterations of improvement
before it becomes obvious to them that they should improve the text.
...
Somewhere they have to keep the OCRed text of their books. It would take
much less cycles to clean up the text (once) instead of having the
search engine do a fuzzy match every time a user does a search.
They probably have enough computing power not to be worried about that,
but perhaps eventually they will have a large enough collection for the
thought to come.


Michael