
On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
text. Today, dictionaries are used to guess which words are to be recognised. That is why the OCR systems today give us better results if the original has DECENT quality!!!
The pattern recognition systems have not gotten better and the dictionary trick takes the motivation away to develop better OCR algorithms.
I'm going to have to call bullshit here. As a researcher working in the field of document recognition, I've noticed tremendous improvements in OCR quality even just in the past five years. The fact is, OCR and document recognition as a whole is a field of tremendous ongoing research. It's no secret that the problem of OCR is not "solved" yet but for some types of document (particularly clean ones using lating characters), results are already damn good. In other areas, particularly regarding degraded documents, results aren't as good but are steadily improving. You state that the so-called "dictionary trick" takes away all motivation to research in the field. This is not what I observe going on in the research community. Dictionary-based lookups are one tool in the arsenal but that's something that's well understood. Some of my colleagues are currently researching novel image processing and feature extraction techniques with the goal of improving raw OCR results. OCR is improving. We're working on it. Cheers, Holden