[gutvol-d] Re: improve scanning on color docs?

17 May 2010

      Looking at the details of their paper they seem to be dealing with simple
"modern" digitizations of simple "modern" documents which ought to be
duck-soup for any modern OCR -- except they deliberately corrupt the image
by doing a very lossy jpeg compression of the digitization and then set the
binary threshold "wrong" so that the resulting characters lose important
parts.

Suggest instead of buying their software just don't do that!

Do not store your digitizations in jpeg mode but rather in some lossless
form such as png.

Spend some time with playing with thresholding software like Photoshop if
your OCR requires binary images else send the OCR a grayscale digitization
to begin with and let the OCR pick its own levels.

A little Unsharp Masking can go a long way too -- as does setting your dpi
appropriate in the first place. 300 dpi for 12pt "equals" 600 dpi for a text
in 6pt !

Playing around a bit to figure out what works best can easily affect your
error rates +- 20% -- which is a lot more than this software claims!

http://www.accusoft.com/Improve_OCR_Accuracy_on_Color_Documents.pdf

[gutvol-d] Re: improve scanning on color docs?

Jim Adcock