
Bowerbird wrote:
greg said:
An automation process, to pull all images from DP at a time an eBook is posted, is very much non-trivial.
copying images from one place to another seems "trivial" enough to me.
the non-trivial part will be setting up the ground-rules for the page-scans, and then making the current set of scans conform to those ground-rules...
[snip] All the points brought up by Bowerbird are excellent and cut to the heart of the various issues to both archive and make available to the public the scans that are submitted to PG/DP for conversion to SDT. My prior message this morning, providing a few of my initial observations on the scan repository project, show that it will be quite laborious to build a *publicly-useful* page scan archive from PG/DP activities because of the lack of standardization and other related factors. One suggestion is likely to be controversial, but I offer it anyway for discussion purposes: DP and PG should set up minimal scan submission requirements. These could include requirements such as page image naming requirements, metadata requirements, etc. It would also standardize the space by which scan sets are submitted, so it will be easier to move the scans over to their final resting place. This way, at least all new submissions will be easier to integrate into a publicly-useful repository. In the meanwhile, then, the backlog of older non-standardized stuff can be sifted through and fixed (such as renaming page scan images as both Bowerbird and I agree is important to do right). How fast this fixing of the older stuff will happen depends upon the extent of the work required to normalize the old scan sets (normalized to whatever standards are established), and the number of volunteers to help out with both the machine- and human-processing required for normalization. At least this way we make sure the problem won't continue to grow over time while what to do with the present set of scans is given more time to study. Thoughts? Jon