rfrank's roundless experiment is proving to be _very_ interesting...

and, as you might expect, there is good news and there is bad news.

let's talk about the good news here in post #12, the bad in post #13.

***

first of all, rfrank is showing that it's not all that difficult to set up
a proofing site.  in just 2 months, he's put together a critical mass,
and that's quite an achievement.  if he shares his code with others,
they'll be able to move even faster.  (if he doesn't, i've got a little
code that'll do the trick for people who want a bit of a head-start.)

it's another matter to pull workers to the site, of course.  however,
if project gutenberg chose to steer people to these _other_ sites,
instead of funneling all the volunteers to distributed proofreaders
(who -- truth be told -- don't even _want_ new people nowadays),
it wouldn't be hard at all for these sites to get enough volunteers.

but even with his low numbers of volunteers, what rfrank is doing is
_head_and_shoulders_ more interesting than anything d.p. is doing.
his site is dynamic, while d.p. has been too moribund for too long...

***

in the last week, rfrank installed a spellcheck capability to his site.
after a mere 2 months.  d.p. went about 5 or 6 _years_ without it.

***

moreover, when d.p. finally got a programmer to code spellcheck,
the process was plagued by a forum discussion that ran 30 pages.

at 15 messages per page, that's 450 messages.  and most of 'em
were from people who didn't know what they were talking about,
and thus just added a buncha noise and confusion to the process.

which is why it's probably not surprising that it was coded wrong.
well, "wrong" is perhaps a bit strong.  but the decision was made
to do spellcheck using "aspell", because "it's open-source code".

which would be fine, if you needed a full-fledged spellcheck...

but that's not what a proofing site needs, because the object is
_not_ to have another word "suggested" (which is the hard part
about coding spellcheck), but merely to _flag_suspicious_words_
(a ridiculously easy task consisting of searching a dictionary to
ascertain whether the word you're checking is included therein)
so that all the suspicious words can be compared to the scan...

i'm guessing rfrank did his spellcheck the simple way.

***

rfrank also installed a capacity for a "good" and "bad" wordlist,
necessary since that customizes the dictionary for each book,
and -- like d.p. -- lets the proofers suggest words to include.

unlike d.p., however, under rfrank's system, whenever a person
"suggests" a word, it's _automatically_ included _immediately_.

at d.p., a suggestion must be considered by a superior, who
might or might not agree, and might or might not be timely.
this is a signal of the disgust with which d.p. treats proofers.

it also means that rfrank's system throws far fewer false flags,
which means it provides much greater value to the proofers...

i worked very hard, in the confines of that 30-page thread,
to have d.p. give the proofers an automatic capability to add
words to the good and bad lists, but they just wouldn't do it.
rfrank did.  good for rfrank. he's smarter than the d.p. crowd.

***

rfrank has also included reg-ex checks, and scanno checks, so
his list of helpful tools is already very impressive, 2 months in.

***

rfrank has also shown he's willing to do global changes to text,
which is one of those things that d.p. has been unwilling to do,
in spite of the fact that i've pointed out the utility of it for years.

d.p. would rather have individual proofers correct every instance
of a global error -- one by one by one, painstakingly -- instead
of fixing 'em all immediately, with one global change.  shameful.

***

rfrank also showed considerable independence when he decided
he would have his people do proofing and formatting together...

it's unclear to me whether the d.p. split between those two tasks
is effective or not, but the _religion_ at d.p. is that it has been...

so it is quite courageous of rfrank to test that accepted "wisdom".

***

rfrank also seems committed to using diffs to train up volunteers.
this, of course, is one of the benefits a roundless system offers,
so it's natural that he'd take advantage, but it's still a good thing.

***

rfrank has given workers a way to make comments _about_ a page
without actually putting them _inside_ the text, which is fantastic.
(he calls this feature "page tweets".)

at d.p., they have a "project thread" in the forums (as does rfrank),
but the only way to make comments about a page is to put them
_inside_ the text.  but of course then someone later down the line
has to _remove_ them from there.  that's a sign of a bad workflow,
when someone later on must undo something that was done earlier.

***

in looking at some of the projects, it seems that rfrank has finally
started doing more aggressive preprocessing of the o.c.r. itself...
for instance, the number of spacey quotes has dropped remarkably.
there are still some, but nowhere near the number he had before...

since this is an area that i know to be _so_ important, any progress
toward enlightenment at all is the sign of a very good development.

***

so, all in all, there's lots of positive aspects to rfrank's experiment.

-bowerbird