greg said:
>   At that point, the files are accessible
>   if you know how to navigate to them. 
>   For example, #17392 could be accessed as
>   http://www.gutenberg.org/dirs/1/7/3/9/17392

oh right, crap, i should have remembered that...

so i looked at the version of "the secret garden"
that was just created by distributed proofreaders,
comparing it to the distillation copy i had prepared.

(which itself had resulted from the original e-text
of this book that is already in p.g., as e-text #113,
being compared with the smoothreading version, so
now i was comparing a comparison to a comparison.
just so you don't get confused...)            ;+)

and d.p. did a _very_ good job on it.  congratulations...

however, i still show 5 outright errors in your version,
the details of which i have appended to this message...
(the top line in each pair is your line, with the error
being followed by the large gap before the next word.)

plus there are also 5 special cases outlined below them,
where i chose to make edits.  your mileage might vary...
(but you _did_ make highly similar edits, for consistency.)

as a point of reference for bystanders, there were
approximately 250-300 points of difference between
the d.p.-processed text and the original p.g. e-text,
so 5-10 errors in boiling them down isn't _that_ many.

5-10 errors on a book would be _excellent_ on a first-pass.
it's respectable even on a comparison project like this one.
although perfection _is_ within the realm of obtainability,
you've attained a very high accuracy-mark here, especially
since your comparison tools probably aren't up to snuff yet.

but the one thing that _does_ give me pause on some of the
things is that they _should_ have been caught by your tools.

any missing end-paragraph terminating-punctuation (page 31)
should be caught.  ditto a continuation-quote (page 108, but
perhaps you were "matching the scan" and passed on that).
and a good tool will detect inconsistent and/or irregular usage,
of the type that is represented in the additional edits i made...

so a comparison tool should eventually find all of your errors here
(mine did), but even your _regular_ tools should've caught _some_!

so i congratulate you for a job well-done, but recommend you
bring your tools up to speed, and then you'll realize perfection,
at least on these re-do projects, and maybe first-passes too...

anyway, perhaps for 2006, one of your resolutions should be
not to alienate any future tool-makers...

and in sum, this double-digitization shows nicely how this approach
can drive an e-text to perfection, probably better than any other...

just in passing, i will note that the postprocessor on this book (miller)
must have applied a lot of elbow-grease doing the comparison, because
the version that went out for smooth-reading had a _lot_ more errors.

there has been a lot of upheaval over at d.p. as a result of the june move
to four rounds, but still only two of those rounds are for actual proofing.
(the other two are for formatting.)  so the primary difference is that the
second proofing round is done by a proofer tested to be "more-qualified".
i don't know if this text went through the new system or not, but if it did,
it does _not_ speak well for it.  i highly recommend that d.p. use a _third_
round of proofing.  (or even better, adopt the recommendation that i made
a while back for a "roundless" system that uses _consensus_ as the factor
that promotes a page from the proofing rounds into the formatting rounds.)

also, it would be nice if -- when d.p. does a "re-do" project like this one --
d.p. would package up the various iterations of the process and make 'em
available to researchers like myself so that we could examine the output
at the different stages and do work on improving the overall results...

so how 'bout it, d.p. people, will you furnish that material on this book?

(indeed, if you could package up the results of each round into a .zip file,
for _every_ book you do, that would be a great resource for researchers.)

-bowerbird

------------------------------------------------------------------------------------------------------------

these are your errors, by the scans and/or common sense:

p. 31
also. "That there?" she said                    "Yes." "That's
also. "That there?" she said.                      "Yes." "That's

p. 129
blades. "There's lots o' 'dead                  wood as
blades. "There's lots o' dead                   wood as

p. 187
told her anything," said Colin,                 "She heard
told her anything," said Colin.                 "She heard

p. 222
out between two sobs: "Sh--show her! She--she'll see then!"
out between two sobs: "Sh-show her! She-she'll see then!"

p. 287
"It is my garden now,                           I am
"It is my garden now.                           I am

------------------------------------------------------------------------------------------------------------

and here are 5 edits, contra-scans, that i made for consistency:

p. 91
an' I was in practice."                         Mary got
an' I was in practise."                         Mary got

p. 108
to her: "_My Dear Dickon:_ This                    comes hoping
to her: _"My Dear Dickon:_ "This                             comes hoping

p. 109
bit o' mother's hot oat                         cake, an' butter,
bit o' mother's hot oat-cake,                   an' butter,

p. 254
said Mary quite seriously. "An                  tha' munnot
said Mary quite seriously. "An'                 tha' munnot

p. 326
got into my throat." "But"                      she said
got into my throat." "But,"                     she said

------------------------------------------------------------------------------------------------------------

there was also this edit, from the earlier p.g. version, that i liked,
so i kept it in my version as a tribute to the original type-in e-text.

p. 108
him?" asked Martha suddenly, she                                had looked
him?" asked Martha suddenly, for Mary                         had looked

------------------------------------------------------------------------------------------------------------