
From: "Felix E. Klee" <felix.klee@inka.de>
How did OCR'ing go? I wonder because the resolution of cheap digital cameras is quite low for scanning.
Well, I did not test OCR'ing at all. :-) I store digitizations only as images which also are used for reading. Please test it yourself and tell the results in the list. ftp://ftp.funet.fi/pub/sci/audio/devel/books/ A few first images are various testings. The digitization sequence test starts at the image 1438. Remember, it is a tourist camera with lens distortions and with poor focus control. I used a plain ceiling light, not better movable lights. The book is on a chair and the photographed page points directly to up -- which is wrong. Yes, one page per image is better because the page bends when the book is laid wide open. The book and camera stand could be designed so that the book rests in V shape holder and that the camera is facing perpendicular to the book page. That is, camera would not be above the book and would not face down. (The scanner, which allows the book rest on the edge of the scanning glass, solves the same bending-pages problem. So does the scanning glass-wedge.) Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open source graphics software

"Juhana" == Juhana Sadeharju <kouhia@nic.funet.fi> writes:
>> From: "Felix E. Klee" <felix.klee@inka.de> >> >> How did OCR'ing go? I wonder because the resolution of cheap >> digital cameras is quite low for scanning. Juhana> Well, I did not test OCR'ing at all. :-) I store Juhana> digitizations only as images which also are used for Juhana> reading. Juhana> Please test it yourself and tell the results in the list. Juhana> ftp://ftp.funet.fi/pub/sci/audio/devel/books/ A few first Juhana> images are various testings. The digitization sequence Juhana> test starts at the image 1438. Please, instead of putting there a big tar.gz file of 72MB, can you put some individual images? Probably downloading a couple is enough to say that they are unsuitable for OCR. Indeed, my attempts with a good digital camera (5Mpixels, manual focus, uncompressed tiff output, a special mode for text, a professional tripod, etc) have been poor. Carlo Traverso

Carlo Traverso wrote on 3/31/2005, 6:52 AM:
Indeed, my attempts with a good digital camera (5Mpixels, manual focus, uncompressed tiff output, a special mode for text, a professional tripod, etc) have been poor.
I am suprised to hear this. I use a Canon S230 3.2Mpixel pocket camera with results as good as my scanner for OCR for ABBYY FineReader 5.0. This is a relatively simple pocket camera. The one thing that took some real work is doing a good job of lighting the book. I now use 2 lights mounted on each size of the camera (currently 13 watt fluorescent task lights, but normal incandescent lights worked as well). I had no luck at all using the flash. I use automatic focus, no flash, close-up mode, with a long exposure time. I use a copy stand modified from a hand drill press to position the camera about 9" above the book. I take each page separately, a 2k x 1.5k JPEG for each 7" by 4.5" page, or almost 300 DPI. The OCR results for 600 DPI, taking a picture of 1/2 the page were no better than the full page results. Clearly, especially for our purposes, the quality of the original makes some difference. How do the pictures you took look to you? It has been my experience that if they looked like faithful reproductions, then they OCRed well. It may be that your expectations of results are higher than mine. If you are interested, I could send you a picture to see what I get. Kent Fielden

Hi, I have to see what my canon 20D will do in BW-mode. But far as resolution goes and OCR, you do not want to go any higher than 300 Dpi, because above 300 dpi the OCR starts to see the structure of the paper and makes mistakes. This is even more important with older books. Try just using 144 dpi this should give you the same results as 300 dpi. Keith. Am 04.04.2005 um 22:22 schrieb Kent Fielden:
Carlo Traverso wrote on 3/31/2005, 6:52 AM:
Indeed, my attempts with a good digital camera (5Mpixels, manual focus, uncompressed tiff output, a special mode for text, a professional tripod, etc) have been poor.
I am suprised to hear this. I use a Canon S230 3.2Mpixel pocket camera with results as good as my scanner for OCR for ABBYY FineReader 5.0. This is a relatively simple pocket camera. The one thing that took some real work is doing a good job of lighting the book. I now use 2 lights mounted on each size of the camera (currently 13 watt fluorescent task lights, but normal incandescent lights worked as well). I had no luck at all using the flash. I use automatic focus, no flash, close-up mode, with a long exposure time. I use a copy stand modified from a hand drill press to position the camera about 9" above the book. I take each page separately, a 2k x 1.5k JPEG for each 7" by 4.5" page, or almost 300 DPI. The OCR results for 600 DPI, taking a picture of 1/2 the page were no better than the full page results. Clearly, especially for our purposes, the quality of the original makes some difference. How do the pictures you took look to you? It has been my experience that if they looked like faithful reproductions, then they OCRed well. It may be that your expectations of results are higher than mine. If you are interested, I could send you a picture to see what I get.
Kent Fielden
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
participants (4)
-
Carlo Traverso
-
Juhana Sadeharju
-
Keith J.Schultz
-
Kent Fielden