
Juhana wrote:
Hello. The master format should be the digitized images of the original book pages. No font, nor footnote, nor math, nor any problems in readability, nor in representing the original text.
I find the digitized images more pleasant than any ascii, html, word or TeX text. I don't know the reason but perhaps the art of typesetting and printing was better then than it is now!...
So, keep archiving the digitized images!! 200 dpi with 32 grey levels starts looking ok but 300 dpi with 256 levels should be enough even for math texts. Forget 1-bit digitizations completely!!!
If the only purpose of scanning books is for OCRing whereupon the scans are either dumped or saved simply for "proving" provenance, then 300 dpi is *usually* sufficient: 8-bit greyscale for black and white, and 24-bit color for color pages. (If some type is very small, such as 5 point and less, then 600 dpi is usually required.) However, in my consultations with experts in the field, and personal experimentation (My Antonia at http://www.openreader.org/myantonia/ ), if the scans are to be used for multiple purposes besides OCR, such as for direct reading and other uses where sharpness is aesthetically important, then it is recommended to scan them at 600 dpi (optical) -- and 1200 dpi (optical) if the print is *very* small. Unfortunately, the resulting scan images become quite large (unless one uses lossy compression, such as DjVu, which is not recommended for the master archiving but alright for end-user delivery.) But if a job is worth doing, it is worth doing right. If there is one area which DP seems to fall short (let me know if I'm wrong here) is with respect to page scan resolution and archiving (or lack thereof). It is understandable considering the required disk space and bandwidth requirements (to move the scans around), but IA is a place to donate page scans once proofing is done (maybe this is already being done), and I'm sure others can be found who will gladly setup a terabyte storage box to store DP's 600 dpi page scans -- just post a plea to SlashDot and there'll probably be several volunteers who will step forward with spare terabytes available. Btw, if anyone here has made, and plans to make, 600 dpi (optical) greyscale or color scans of any public domain books including the book covers (and this includes books printed between 1923 and 1963 which may be public domain), I'll gladly accept donations of them on CD-ROM and DVD-ROM. I will also gladly accept the source books themselves, including if they've been chopped. I eventually will build a multi-terabyte hard disk storage system to support various activities including Distributed Scanners. Of course, the scans should be donated to IA as well so they can immediately be made available to the world. Jon Noring