re: [gutvol-d] Scans and Texts (Re: Copyright Verification?)

14 Jul 2005

      marcello said:
...
Are these scans online and accessible at DP ?
   If so, linking them from the PG catalog would be 
   a matter of a few hours work, 
   assuming I can get a list of etext-no => page-scan-url
would that it were that easy...

***

not to pick on marcello,
since i wouldn't do that,
because i know he does
a lot of work on the p.g. site,
so he's busy with other things...

but...

this thread would be a lot more productive
if people would familiarize themselves with
the actuality of the scans on the d.p. site...

some of them are not of very high quality.
many of them are not very well-organized.
(although it might not appear that way to the naked eye,
that last sentence is the very model of understatement.)

***

juliet said:
...
Aside from not having the development resources to 
   set up some kind of system for accessing and using the scans,
    we also have not yet found someone who will wade through 
   all the archived material to sort it out so that it can actually be 
used.
i heard jon noring volunteering!   just three posts back!   really!        
:+)

***

david said:
...
I've looked at a similar system on Project Runeberg, 
   but it doesn't look very successful at 
   outputing a number of accurate and complete etexts. 
   DP does a very good job at keeping attention focused on a few texts 
   and keeping them moving forward page by page in the system,
   where as those system seem to disperse the effort over a lot of books
   and a lot of pages that are corrected more or less at random.
that's a good analysis.

of course, it's also important to note that
neither of these projects has volunteers
anywhere near the numbers that d.p. has.
if they did, their performance would be better;
they'd be more focused, and get better results.

note that the general problem here -- which is
"how do you know when each page is _done_?"
-- is one that distributed proofreaders has too.

but because of its numbers, and the dedication
of its proofers, d.p. has the luxury of answering
"when it has gone through x number of rounds."

even though that's not the right answer to the question -- 
the right answer is "when no more changes are being made" 
-- by the time you've hit 4 rounds (plus post-processing),
the odds are much better that all the errors were found.

another difference between all of these systems is that
the d.p. interface is remarkably better than the others.

furthermore, the post-processing apps over at d.p. are
some of the best around right now, so that helps too...

but if you built a system of continuous proofreading that
had an interface as good as the one over at the d.p. site,
with as many volunteers spending as much time, and with
as much care, it would do as good as d.p. does.   or better...

still, d.p. _has_ the volunteers, right now, working hard,
so there's little need to channel them to another method.
they've got a nice little cult there, i mean "community",        :+)
and as long as they're happy, we should just cheer 'em on.

and since there are very few calls -- outside of jon --
for the kind of "transparency with the source materials"
that putting scans online would serve, i don't see much
point in doing it yet.   eventually, diskspace and bandwidth
will be plentiful enough to do that without any reservation;
but that day ain't here yet.   indeed, distributed proofreaders
doesn't even leave its scans on brewster's internet archive
after that book has gone through proofing, simply because
they don't have enough diskspace to do that.   it's true.

the more important issue here is one that jon is trying to
slip under the door while he makes a big noisy distraction
-- his emphasis on a complex and heavy form of markup...

jon wants you to believe this heavy markup is necessary
to deliver a whole bunch of benefits that he will promise.
but he has no way to deliver those promises.   and i suspect
that when he gets deep enough in the fertilizer he's pushing,
he'll find his complex systems start tripping over themselves.

for instance, even some of the most diehard of x.m.l. people
are now saying that x.s.l.t. is too complex for many purposes.

or let's look at inter-document linking, which is one of the things
that jon likes to say is on his agenda, while he's waving his hands.

this is an arena that many good people have wasted a lot of time on.
i say "wasted" because either one of two conditions must apply here:
      1.   documents are permanently available at a u.r.l., or
      2.   they are not.
if condition one is the case, anyone can build an inter-linking system
that works perfectly.   and if condition two is the case, _nobody_ can.
oh, they'll make you _promises_.   and they'll be able to build something
that _kinda_ works, _most_ of the time anyway.   but anyone can do that.

so think long, hard, and very carefully about adopting heavy-markup...

(but hey, if you _are_ gonna do heavy-markup, _when_will_you_start_?
i came to this listserve over a year-and-a-half ago, asking you that, and
you _still_ haven't gotten the ball rolling yet.   what is the delay, folks?)

-bowerbird

re: [gutvol-d] Scans and Texts (Re: Copyright Verification?)

Bowerbird＠aol.com