
On Fri, 2 Jun 2006, Marcello Perathoner wrote:
Michael Hart wrote:
Google's monster speciality is SEARCH ENGINES!!!
They are MUCH more interested in writing a search engine that will read fuzzy OCR text than in increasing the accuracy of the text.
You mean a search engine that finds "I)arwin" when I search for "Darwin"?
That search engine would have to automagically decide that "I)" looks quite a bit the same as "D".
Someone posted a number of such examples they found a while back, and it appeared as if that was the general idea.
But that's the same thing an OCR software already does! to match characters against ink stains. If they come up with some better algorithm to do that, they would be foolish not to use it directly on the scanned texts.
I think they will probably wait several iterations of improvement before it becomes obvious to them that they should improve the text.
Somewhere they have to keep the OCRed text of their books. It would take much less cycles to clean up the text (once) instead of having the search engine do a fuzzy match every time a user does a search.
They probably have enough computing power not to be worried about that, but perhaps eventually they will have a large enough collection for the thought to come. Michael