
On 7/22/05, Jon Noring <jon@noring.name> wrote:
One question of Robert and the other DPers: how were blank pages handled?
If it has a page number (or fits into the page sequence) I scan it. If it the verso or obverse of an unnumbered illustration, I skip it. The goal is to capture the content; grabbing an unnumbered blank page doesn't accomplish much.
As an aside, I've always thought the best system would be to separate the DP proofing system from the scanning portion. In essence, to setup a separate (autonomous) "Distributed Scanners" which will encourage the scanning of older books, set minimum quality requirements, QC, standardized cataloging (possibly MARC-XML), clean up the scans to form working sets (deskewing, cropping, color depth reduction, etc.), and do so in a semi-distributed environment akin to DP. Then the work product would be archived at IA (with public access to some of the derivative scan sets if not the masters). And of course DO would generate a derivative scanset optimized for DP's process. If the system works well, DP could encourage submitters to go through the DS system for submitting scans.
Frankly, I think I lean more towards the other DPers.. the scan images are a means to an end; the end being a correct, well-formatted eBook. HTML, especially, is very flexible in output formats.. I can view it anywhere from a graphing calculator to a Sun Workstation, or run it through a TTS engine and listen to it. Page images are a useful reference to compare the etext against, but are not very flexible. Now, in general, I _would_ like to see an improvement in the average image quality of included illustrations. I personally preprocessed all of the Potter illustrations (main reason I haven't finished is I don't have the time right now to do it right) before uploading them.. but it is a lot of work, and a skill that takes practice to get right. I still consider myself only an intermediate photoshop user. R C