
Hello, Am 10.03.2006 um 11:24 schrieb Holden McGroin:
On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
text. Today, dictionaries are used to guess which words are to be recognised. That is why the OCR systems today give us better results if the original has DECENT quality!!!
The pattern recognition systems have not gotten better and the dictionary trick takes the motivation away to develop better OCR algorithms.
I'm going to have to call bullshit here. As a researcher working in the field of document recognition, I've noticed tremendous improvements in OCR quality even just in the past five years. Before you start to swear, read and understand! Maybe in the development labs, but not for the non-high end user!!!!
The fact is, OCR and document recognition as a whole is a field of tremendous ongoing research. It's no secret that the problem of OCR is not "solved" yet but for some types of document (particularly clean ones using lating characters), results are already damn good. In other areas, particularly regarding degraded documents, results aren't as good but are steadily improving.
You state that the so-called "dictionary trick" takes away all motivation to research in the field. This is not what I observe going on in the research community. Dictionary-based lookups are one tool in the arsenal but that's something that's well understood. Some of my colleagues are currently researching novel image processing and feature extraction techniques with the goal of improving raw OCR results.
We have not seen any improvements in the field for the past five years!!! The improvements are mainly due to the use of dictionaries!! Not the improvement of character recognition!! Most systems in the field get their performance out of word recognition !!!
OCR is improving. We're working on it.
I did mean to say not there is no improvement in Optical Character Recognition, but the improvment over the past 10 years is minimal at most. When I see a OCR system that just uses raw results, then I will bow my head in recognition of true achieve meant. Furthermore, when the image processing gets that far it will open up new possiblities in all kinds of sciences.