Juhana wrote:
Hello. The master format should be the digitized images of the
original book pages. No font, nor footnote, nor math, nor any
problems in readability, nor in representing the original text.
I find the digitized images more pleasant than any ascii, html,
word or TeX text. I don't know the reason but perhaps the art of
typesetting and printing was better then than it is now!...
So, keep archiving the digitized images!! 200 dpi with 32 grey levels
starts looking ok but 300 dpi with 256 levels should be enough
even for math texts. Forget 1-bit digitizations completely!!!
If the only purpose of scanning books is for OCRing whereupon the
scans are either dumped or saved simply for "proving" provenance, then
300 dpi is *usually* sufficient: 8-bit greyscale for black and white,
and 24-bit color for color pages. (If some type is very small, such as
5 point and less, then 600 dpi is usually required.)
However, in my consultations with experts in the field, and personal
experimentation (My Antonia at http://www.openreader.org/myantonia/ ),
if the scans are to be used for multiple purposes besides OCR, such as
for direct reading and other uses where sharpness is aesthetically
important, then it is recommended to scan them at 600 dpi (optical) --
and 1200 dpi (optical) if the print is *very* small. Unfortunately,
the resulting scan images become quite large (unless one uses lossy
compression, such as DjVu, which is not recommended for the master
archiving but alright for end-user delivery.) But if a job is worth
doing, it is worth doing right.
If there is one area which DP seems to fall short (let me know if I'm
wrong here) is with respect to page scan resolution and archiving (or
lack thereof). It is understandable considering the required disk
space and bandwidth requirements (to move the scans around), but IA
is a place to donate page scans once proofing is done (maybe this is
already being done), and I'm sure others can be found who will gladly
setup a terabyte storage box to store DP's 600 dpi page scans -- just
post a plea to SlashDot and there'll probably be several volunteers
who will step forward with spare terabytes available.
Btw, if anyone here has made, and plans to make, 600 dpi (optical)
greyscale or color scans of any public domain books including the book
covers (and this includes books printed between 1923 and 1963 which
may be public domain), I'll gladly accept donations of them on CD-ROM
and DVD-ROM. I will also gladly accept the source books themselves,
including if they've been chopped. I eventually will build a
multi-terabyte hard disk storage system to support various activities
including Distributed Scanners. Of course, the scans should be
donated to IA as well so they can immediately be made available to the
world.
Jon Noring
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d