Re: [gutvol-d] Scans and Texts (Re: Copyright Verification?)

15 Jul 2005

      Bowerbird wrote:
...
greg said:
...
...
 An automation process, to pull all images from DP
 at a time an eBook is posted, is very much non-trivial.
...
copying images from one place to another seems "trivial" enough to me.
the non-trivial part will be setting up the ground-rules for the page-scans,
 and then making the current set of scans conform to those ground-rules...
[snip]

All the points brought up by Bowerbird are excellent and cut to the
heart of the various issues to both archive and make available to the
public the scans that are submitted to PG/DP for conversion to SDT.

My prior message this morning, providing a few of my initial
observations on the scan repository project, show that it will be
quite laborious to build a *publicly-useful* page scan archive from
PG/DP activities because of the lack of standardization and other
related factors.

One suggestion is likely to be controversial, but I offer it anyway
for discussion purposes:

DP and PG should set up minimal scan submission requirements. These
could include requirements such as page image naming requirements,
metadata requirements, etc. It would also standardize the space by
which scan sets are submitted, so it will be easier to move the scans
over to their final resting place.

This way, at least all new submissions will be easier to integrate
into a publicly-useful repository. In the meanwhile, then, the backlog
of older non-standardized stuff can be sifted through and fixed (such
as renaming page scan images as both Bowerbird and I agree is
important to do right). How fast this fixing of the older stuff will
happen depends upon the extent of the work required to normalize the
old scan sets (normalized to whatever standards are established), and
the number of volunteers to help out with both the machine- and
human-processing required for normalization.

At least this way we make sure the problem won't continue to grow over
time while what to do with the present set of scans is given more time
to study.

Thoughts?

Jon

Re: [gutvol-d] Scans and Texts (Re: Copyright Verification?)

Jon Noring