Re: [gutvol-d] 600 dpi vs. 300 dpi for text (a quickie visual experiment)

22 Jul 2005


      On 22 Jul 2005, at 8:31, Jon Noring wrote:
...
Branko wrote:
...
Jon Noring wrote:
...
...
But I do believe that those who are submitting scans to DP should
seriously consider doing all scans at 600 dpi full color
...
DP should be as accessible as possible to content providers (those
who provide scans) and every roadblock we put in their way is A Bad
Thing, period.
Note carefully what I said above. I am not suggesting that DP
increase their scan submission requirements, but *suggest* that those
who provide scans should scan them at higher resolution and color
depth.
Unfortunately, people might take that to heart and start providing 
high-quality scans in the time that they could have provided four 
times as many low-quality scans. Good for you, bad for PG.
...
...
If you have use for our waste product (the scans), then more power
to you! But as long as our main product serves a higher goal than
the waste product, I think we should squarely focus on producing the
main product, i.e. plain vanilla etexts of as many books as possible
for as many people as possible for as long a time as possible.
But this begs the question -- are book scans a "waste product"?
To PG/DP: yes, most of the time. Don't take that as a negative thing: 
one man's waste product can be another man's gold.
...
This is the crux of the issue: the value of the book scans 
themselves. I believe they are not a waste product, while 
others in the PG universe consider them solely as a necessary 
evil to get to the final structured digital text.
I think it goes deeper than that, even to or near the core of PG's 
philosophy. If I had been Michael Hart, I might have set up a scan 
archive first, reasoning that once OCR quality had improved to the 
point that it would yield 99.8 % perfect texts, I could always 
convert images to text. But I am not. Of course, I am always free to 
start my own project, one that works exactly on the basis I just 
outlined, but I personally think that is not worth the bother. I 
prefer to create value now at PG than in the distant future at my own 
project.
...
...
The one exception would be if you could somehow provide us with
scans (as many projects already do) in as troublefree a manner as is
humanly possible. But such a thing need not be done in the context
of PG or DP, and I doubt it even needs to be discussed here
(although, of course, here is where you will find like-minded
people).
Definitely! There's only two communities really interested in scanning
old books: PG and DP. (There's also some academic communities, but by
and large they are either interested only in a very small subset, or
take a closed and proprietary position to the availability of the
scans to the public.)
There's archive.org, the Million Books project, the Canadian 
Libraries, several PG-like projects (Runeberg, Project Madura), CCEL, 
Blackmask, Sacred Texts, and I am sure there are dozens others (only 
think of all the author-related associations that scan books!). PG is 
just one of the biggest (and certainly oldest) fishes in the pond, 
but by no means the only one.

-- 
branko collin
collin@xs4all.nl

Re: [gutvol-d] 600 dpi vs. 300 dpi for text (a quickie visual experiment)

Branko Collin