New subject: Scans and Texts (Re: Copyright Verification?)

15 Jul 2005

      greg said:
...
An automation process, to pull all images from DP 
   at a time an eBook is posted, is very much non-trivial.
copying images from one place to another seems "trivial" enough to me.

the non-trivial part will be setting up the ground-rules for the page-scans,
and then making the current set of scans conform to those ground-rules...

although i would imagine (or hope, anyway) that are smarter now,
as recently as last year, i encountered _extremely_ stiff resistance
at d.p. for suggesting even such _basics_ as file-naming conventions
(e.g., that the scan for page 87 should be named "rootname087.png"
not something like "rootname094.png", as was typical.)   in a nutshell,
if page-scans are maintained in the chaotic way that many other files
are being handled in the library, you'll have a nightmare on your hands.
one e-text can have literally hundreds of page-scans associated with it;
thus, with 17,000 e-texts, we're talking now about 2-4 _million_ files.
if you approach a task like that haphazardly, it will bite you back _bad_.

simply put, the scans weren't created and haven't been maintained
with an eye toward making them publicly available.   that's not the
_fault_ of the d.p. people, because that was not their concern, but
it is a reality that needs to be faced if we want to make them public.
it's going to be a _lot_ of work to mold them into something useable.
juliet recognizes that -- that's precisely what she was telling people.
...
But just doing one or two titles as a sample would help.
well, that would depend on whether wise conventions are adopted first,
or develop out of those samples.   if these samples instead merely boost
the "do it however you want" idea that permeates the rest of the library,
they will do more harm than good.

it's also very important to understand that the stimulus here is all wrong.
rather than driven by some vague notion that scans "should" be available,
a "needs analysis" has to be done to determine _who_ will use the scans,
and _how_, so that the policies that are put into place are _wise_ ones.

for instance, i might be wrong about this, but i think the current policy is
to wrap all the page-scans up into one zip file.   there are merits to that,
but it's also the case that a zip file is not the best thing for many useful
applications, not the least of which is checking the scan of a single page
(e.g., to see if an error-report is supported or negated by the page-scan).
(in regard to this specific point, i think scans should be stored both ways
-- as individual files and as a single zip-file -- perhaps on different 
servers.)

as there is, at present, no user clamor for the scans, we _do_not_know_
how the end-users might want to use the scans, so we're _in_the_dark_
about the factors that we should apply in the development of any policies.

so if i were a decision-maker, i would wait until a clamor actually developed
before i moved forward on this.   perhaps the absence of many volunteers
who are willing to actually expend their energy on this project will serve as
the brake necessary to keep it from lurching ahead prematurely.
...
As Juliet & others mentioned, the *archiving* is already being done.
   The next step is distribution.
well, i would imagine (or hope anyway) that a formally-trained librarian
would use the term "archiving" with a little more sensitivity.   the scans 
are being _saved_, but there is a huge gulf to cross before they can be
considered to have been "archived".   the page-scans as they are now
are a _very_ long way from being ready for the "distribution" step...

maybe jon can mobilize some volunteers to do all the work necessary.
but otherwise, i don't see any people stepping forward at this time...

-bowerbird

re: [gutvol-d] Scans and Texts (Re: Copyright Verification?)

Bowerbird＠aol.com

Jon Noring

tags

participants (2)