re: [gutvol-d] Copyright Verification?

brent said:
Currently in america there is a decline in scanner use.
giving the popularity of all-in-one printer/scanner/copier/fax machines, i don't see how this statement can be justified. and i _most_certainly_ do not see an increase in the numbers of "type-in" projects, not at all! but if _you_ personally wanna type in some books, be my guest! it's a wonderful way to become thoroughly immersed in a text... *** geoff said:
The last, oh, five or six books I've submitted to PGDP have been photographed (on a 4 MP camera), not scanned. It's a lot less rough on books, and the results were as good as scanning once I figured out what I was doing.
i'm skeptical of that claim. i think if we tested the quality of the o.c.r. recognition results, using the best-in-class o.c.r. app, we would find a significant difference between images from a 4-m.p. camera and the best-in-class scanners, such as the opticbook3600. and dollar-for-dollar scanners give better images than cameras, though i'm not disputing your assertion that cameras might be "less rough" on the books. just because most of the proofers over at d.p. are not aware that the images they are getting are less-than-the-best doesn't mean it isn't so.. but as long as they don't mind correcting errors that wouldn't even exist if the images were better, or if a better o.c.r. program would have been used, i guess it doesn't matter all that much... -bowerbird

On Mon, 11 Jul 2005 13:50:03 EDT, Bowerbird@aol.com wrote: | brent said: | > Currently in america there is a decline in scanner use. | | giving the popularity of all-in-one printer/scanner/copier/fax machines, | i don't see how this statement can be justified. and i _most_certainly_ | do not see an increase in the numbers of "type-in" projects, not at all! I have one it was cheap and works a treat. -- Dave Fawthrop <dave hyphenologist co uk> In Case of Emergency Store the word "ICE" in your mobile phone address book, and against it enter the number of the person you would want to be contacted "In Case of Emergency". http://tinyurl.com/79lz9

Bowerbird said:
geoff said:
The last, oh, five or six books I've submitted to PGDP have been photographed (on a 4 MP camera), not scanned. It's a lot less rough on books, and the results were as good as scanning once I figured out what I was doing.
i'm skeptical of that claim.
I do think it is possible to get good results using a digital camera for scanning books (especially if one is not concerned with the archival quality of the scans.) HOWEVER, there are issues that have to be dealt with, that only a few people have the necessary aptitude. Thus Bowerbird's skepticism is valid if we look at "scanning by the masses". Here's some of the technical issues as I see them: 1) Focus. It is important that one gets a very good focus. Focus is affected by the focal length of the lens, the aperture f-stop, and the lens quality. A higher f-stop helps improve focus (and improve focal plane depth), but which increases the needed lighting and/or longer shutter speed (and longer shutter speeds are much more succeptible to camera and tripod vibration.) With even a low-end consumer scanner, none of the above are issues. 2) Minimizing optical distortion. Optical distortion is caused by two factors: a) astigmatism due to poor optics found on cheaper cameras, and b) barrel distortion due to using too short of a lens focal length (meaning the lens is closer to the page to be photographed.) Poor optics is fixed by $$$ (better camera), and barrel distortion is fixed either by using a longer focal length lens or by digital post-processing to remove the barrel distortion (using a calibration grid helps with correction.) The downside with a longer focal length lens is that the camera must be further from the page, making a frame to support the camera possibly bigger (leading to greater vibration issues). Lighting needs are also increased (or by getting a *big* aperture lens, which is $$$ and not available for most non-SLR cameras -- in fact, the optimum lens for page scanning is probably not available for most consumer-level, non-SLR digital cameras.) Anyone who has experience in photography knows all of these issues. Those who don't, probably are beginning to realize the non-triviality of "do-it-yourself scanning" using a digital camera. Of course, reasonable quality scanners don't have any of these issues. 3) Lighting. As should be obvious now, it is important to have good lighting to increase f-stop, decrease shutter speed, etc. However, it is also important to have the right kind of lighting (mostly diffuse) so one does not have errant reflections of the page and varying intensity across the page. Achieving good lighting is not a trivial exercise. Lighting is not an issue with scanners. 4) General complexity of the process. Unless one uses a well-designed frame specially designed for a higher-quality digital camera (preferably a professional or "prosumer" level SLR which is minimum $1000), using a digital camera to achieve good to excellent-quality page scan results is simply out-of-reach except for the very mechanically-adept DIY kind of people. Some DIY solutions are likely to be very kludgy -- unstable and requiring "4 hands" to operate. Overall, scanners are simpler for the average Joe to run. If minimizing harm to the book is required, as Bowerbird recommends, get a Plustek OpticBook or similar scanner designed to be easy on book bindings. Now, this should not dissuade people from using digital cameras for book scanning, but it's not something one just runs down to Wal-Mart to buy the $100 4-megapixel camera and then get perfect scans right out of the box. It requires not only a better quality camera, but well-done lighting, some kind of custom frame or tripod to hold the camera in the right position (plus a means to assure the page of the book is within the focal plane of the lens, another issue not mentioned above), lots of futzing to get the system and settings right (and to overcome engineering issues), etc., etc. The Internet Archive, for example, has been working on just such a gentle-on-books scanner setup using digital cameras. But they are engineering the system to overcome the deficiencies mentioned above, resulting in a system which is better and cheaper for the high-volume and reasonably high-quality scanning they want to do (this is compared to the ultra-expensive $100,000 "page turning" commercial scanners they have been using.) I'm hoping they will "open source" their engineering effort to share with the world and allow other engineers to continue to improve upon.
i think if we tested the quality of the o.c.r. recognition results, using the best-in-class o.c.r. app, we would find a significant difference between images from a 4-m.p. camera and the best-in-class scanners, such as the opticbook3600. and dollar-for-dollar scanners give better images than cameras, though i'm not disputing your assertion that cameras might be "less rough" on the books.
Not sure about your first point *if* the person using the digital camera does things right. But as noted above, using a digital camera for page scanning and getting good results is not a trivial exercise, and out-of-reach of the average Joe who just wants to scan books and does not have a Ph.D. in photography or mechanical engineering. You are right in that, dollar-for-dollar, and for more consistent and easier-to-obtain results, it is much better for the average Joe to use a scanner rather than a digital camera for book scanning.
just because most of the proofers over at d.p. are not aware that the images they are getting are less-than-the-best doesn't mean it isn't so..
My focus on scanning goes beyond just OCR purposes -- I think if substantial work is being expended to acquire and scan a book, it takes only a little extra effort to scan at archival quality, which is at least 600 dpi optical (and 256 color greyscale for bitonal and even better 24-bit color.) It is also wise to scan a calibration color/greyscale chart before each book is scanned so it is possible to post-process the images should the scanner calibration be off some. I am saddened and frustrated when I see all this scanning activity of Public Domain materials going on, but being done haphazardly and with the needless throttling down of the scan quality. Jon Noring

I invite any and all skeptics to view the page images currently in P1 for _The Arian Controversy_. It's not nearly as hard as Jon makes it out to be. http://www.geoffhorton.com/pictureocr/instructions.html has an explanation of what I do.

Hi Geoff,
I invite any and all skeptics to view the page images currently in P1 for _The Arian Controversy_. It's not nearly as hard as Jon makes it out to be.
http://www.geoffhorton.com/pictureocr/instructions.html has an explanation of what I do.
Very interesting! Of course, let's now look at the issue from: 1) Cost viewpoint 2) Convenience for the average Joe who is not interested in DIY. 3) Archival quality (not of interest to everyone). Regarding 1), a person can buy the PlusTek OpticBook 3600 for about $250 or so. The web site about this scanner is: http://www.plustek.com/products/book.htm How much in supplies, parts, etc., not to mention the camera, does it cost for your solution? (Of course, the digital camera can be used for non-scanning purposes, so that is a benefit. I am quite familiar with photography and digital cameras. I currently own an Olympus C-50 with 5.0 megapixel resolution, and I have my eyes on a prosumer-level SLR when I can afford it.) Regarding 2). What if a person is not interested in DIY stuff? There are a lot of people like that out there! <smile/> Regarding 3), can you achieve effective 600 dpi results (including factoring in camera focus issues) for a college textbook-size page? That would require a 14 megapixel camera. In contrast, the OpticBook can scan an 8.5" by 11.7" page which, at even 300 dpi requires a 9 megapixel camera (and note the OpticBook can scan at 1200 dpi optical, thus an 8.5" by 11.7" page scanned at 1200 dpi would be equivalent to a 143 megapixel camera which does not yet exist, as far as I know.) ***** To conclude, what I am NOT saying is that it is impossible to get good page scans using a digital camera -- what I am saying is that it is not easy nor convenient to do (especially for the average Joe), and not that cost effective, either. There is a big difference. I'm a DIY kind of guy (helping my son now to build audiophile grade speakers and tube amps), so I understand the zeal you have in getting a digital camera to do page scanning, but I also recognize that for most people, those who aren't like us, ready-made solutions which are optimized and engineered for the job at hand are better. Jon Noring

A followup comment: Geoff wrote:
I invite any and all skeptics to view the page images currently in P1 for _The Arian Controversy_. It's not nearly as hard as Jon makes it out to be.
http://www.geoffhorton.com/pictureocr/instructions.html has an explanation of what I do.
Another factor to consider is 4) Time to scan a book. Compare your system with using a Plustek OpticBook. I would surmise an OpticBook would be slightly faster. Jon

Of course, let's now look at the issue from:
OK, let's. :)
1) Cost viewpoint
2) Convenience for the average Joe who is not interested in DIY.
3) Archival quality (not of interest to everyone).
Regarding 1), a person can buy the PlusTek OpticBook 3600 for about $250 or so. The web site about this scanner is:
I was about this close (holds fingers close together) to buying one, but I'm on an extremely limited budget and already had the camera. An additional benefit (as you note) is that the camera has other uses, whereas taking pictures of scenic spots with an OpticBook is not going to work very well.
How much in supplies, parts, etc., not to mention the camera, does it cost for your solution?
$15, assuming you don't burn out lightbulbs too fast. I'm using a $5 gooseneck lamp from Target and a fairly large acrylic frame that was around $10. The Gimp is free. The OCR program is a cost either way. I don't use the rubber bands to hold pages anymore, but they're cheap, too.
Regarding 2). What if a person is not interested in DIY stuff? There are a lot of people like that out there! <smile/>
Then, by all means, they should get a scanner. (BTW, I'd like to lower the DIY factor by creating a Cygwin-free executable of my image-processing program. Anyone who can walk me through it will earn my gratitude.)
Regarding 3), can you achieve effective 600 dpi results (including factoring in camera focus issues) for a college textbook-size page?
I doubt it--but how often does a task really call for 600 dpi anyhow? But I agree, it's more problematic for large books. Fortunately, everything I've wanted to do so far has been small enough that it really hasn't been an issue. Vol. 1 of Jefferson Davis's history of the Confederate government was about the largest I've done, and I wouldn't want to go much larger.
To conclude, what I am NOT saying is that it is impossible to get good page scans using a digital camera -- what I am saying is that it is not easy nor convenient to do (especially for the average Joe), and not that cost effective, either. There is a big difference.
I guess I'm stuck differing with you, or would be if I had a compiled executable to distribute. Given the camera, the computer, and the OCR software, the additional investment is minimal. And given practice, I can go through pages as quickly, or maybe even more quickly, with the camera than I could with the scanner. (Note that it's a fairly old USB 1 scanner, so newer ones might take some of that time off.) As noted, I do have a scanner, and I have used it on a recent project to scan the illustrations. It's no big deal if the text is photographed with the camera not quite parallel to the page, but it's a problem for pictures. (The scanner also has built-in Moire reduction in the software.) I'm not asking the world to give up on scanning, but I am pointing out that this alternative is not nearly as hard as I thought it would be, and as I suspect others still think. Geoff

--- Jon Noring <jon@noring.name> wrote:
Regarding 1), a person can buy the PlusTek OpticBook 3600 for about $250 or so.
As long as one is counting all the costs, throw in the price of a Windows license too. Geoff's point is that, for what we do at PG, you can get pretty good results with a little fussing if you already own a camera or are interested in buying one anyhow. That provides more options for very fragile books. Your desire for archival-quality scans doesn't affect the usability of the rig for good-enough-to-OCR images. (Incidentally, I might as well mention here since it's tangentially related: I haven't forgotten about those Kama Sutra scans :) . Preparing them for dp took me a couple of hours; cleaning them up for direct online viewing is what's taking the time right now.) ____________________________________________________ Sell on Yahoo! Auctions no fees. Bid on great items. http://auctions.yahoo.com/
participants (5)
-
Bowerbird@aol.com
-
Dave Fawthrop
-
Geoff Horton
-
Jon Niehof
-
Jon Noring