
On 22 Jul 2005, at 8:31, Jon Noring wrote:
Branko wrote:
Jon Noring wrote:
But I do believe that those who are submitting scans to DP should seriously consider doing all scans at 600 dpi full color
DP should be as accessible as possible to content providers (those who provide scans) and every roadblock we put in their way is A Bad Thing, period.
Note carefully what I said above. I am not suggesting that DP increase their scan submission requirements, but *suggest* that those who provide scans should scan them at higher resolution and color depth.
Unfortunately, people might take that to heart and start providing high-quality scans in the time that they could have provided four times as many low-quality scans. Good for you, bad for PG.
If you have use for our waste product (the scans), then more power to you! But as long as our main product serves a higher goal than the waste product, I think we should squarely focus on producing the main product, i.e. plain vanilla etexts of as many books as possible for as many people as possible for as long a time as possible.
But this begs the question -- are book scans a "waste product"?
To PG/DP: yes, most of the time. Don't take that as a negative thing: one man's waste product can be another man's gold.
This is the crux of the issue: the value of the book scans themselves. I believe they are not a waste product, while others in the PG universe consider them solely as a necessary evil to get to the final structured digital text.
I think it goes deeper than that, even to or near the core of PG's philosophy. If I had been Michael Hart, I might have set up a scan archive first, reasoning that once OCR quality had improved to the point that it would yield 99.8 % perfect texts, I could always convert images to text. But I am not. Of course, I am always free to start my own project, one that works exactly on the basis I just outlined, but I personally think that is not worth the bother. I prefer to create value now at PG than in the distant future at my own project.
The one exception would be if you could somehow provide us with scans (as many projects already do) in as troublefree a manner as is humanly possible. But such a thing need not be done in the context of PG or DP, and I doubt it even needs to be discussed here (although, of course, here is where you will find like-minded people).
Definitely! There's only two communities really interested in scanning old books: PG and DP. (There's also some academic communities, but by and large they are either interested only in a very small subset, or take a closed and proprietary position to the availability of the scans to the public.)
There's archive.org, the Million Books project, the Canadian Libraries, several PG-like projects (Runeberg, Project Madura), CCEL, Blackmask, Sacred Texts, and I am sure there are dozens others (only think of all the author-related associations that scan books!). PG is just one of the biggest (and certainly oldest) fishes in the pond, but by no means the only one. -- branko collin collin@xs4all.nl