i've made references to "smoothreaders" lately, mostly because
that's a term commonly used over at distributed proofreaders...
but i probably need to use some other term instead, because
i'm not really talking about the same thing that they do at d.p.
over there, "smoothreading" is a download-the-text-only deal.
sometimes it can be .html, not just plain-text, but the point is
that it _does_not_ involve the actual _pagescans_ themselves...
whereas what i am talking about here _most_definitely_does_
involve an actual comparison of the text against the pagescan,
albeit if only in those cases where there's doubt about the text.
but the interface that i am talking about _always_ has the text
displayed side-by-side with the pagescan, for easy comparison.
***
also, at d.p., "smoothreading" is considered to be the _end_ of
the process of preparing the text. i can see that viewpoint, but
that means that the participants are volunteers who're already
_embedded_ within the distributed proofreader infrastructure.
i envision people who would consider themselves to be outside
of such a system. and rather than the _end-point_ of the process,
i would think they'd consider themselves to be the "beta-readers"
for the e-text as it was _beginning_ its journey out to cyberspace.
in other words, it's less of a "graduation" than a "commencement".
then again, maybe the best way to think about this stage is as
_both_ a "graduation" _and_ a "commencement", so perhaps we
should be using one of those hello-goodbye words like "aloha".
so i propose we call them "aloha-readers"...
***
further, since these extra sets of eyes would be of the most value
to _independent_producers_, i'd suggest that project gutenberg
itself should recruit and train and enable these "aloha-readers"...
i don't know (or care) where it would fit in the workflow that now
has whitewashers doing (at least some of) this kind of last-minute
checking -- perhaps, in the spirit of "aloha" it could be _both_ --
i'm just saying that it will be extremely helpful to solo producers.
***
don said:
> In my own proofing the only way I can have
> any confidence at all that I'm catching all the italics
> is to scan through the entire text looking for nothing but.
i think that's true of a lot of people.
and even then, any "confidence" might be undeserved.
because most people will _still_ miss 5% to 15% of them,
and sometimes even more. i have proven it with myself.
i have stared and stared at some pages, _knowing_ that
it contains one or more italicized words that i'm missing,
and _still_ been unable to spot any. it's just the way it is.
so i feel the only way to gain a good degree of certainty is
to have the pages examined by more than one person --
and i wouldn't _bet_ on 99% accuracy with fewer than three.
but of course, if/when you have independent digitizations,
you can use the _comparison_methodology_ i'm discussing,
and i can guarantee you that it will jack up your accuracy...
so far, with this copy of "huck" that i'm digitizing right now,
i've used jim's italics _and_ david's _and_ some of the o.c.r.,
and i am very confident that i'm getting fairly good coverage,
although even then you'll notice the hedge with "fairly good".
(and i can also report that both david and jim missed "some".)
further, i'm making my text available, so you are welcome to
take on the challenge of finding any italics that i've missed...
> As soon as I start noticing (and marking) anything else,
> I start missing them.
i can also say that that's true for me too. i need the focus.
> It also destroys my accuracy when the text already has
> italics markup in it and I must check for false positives.
well, there should almost never be "false positives" in a file.
not for italics. false positives mean your workflow is flawed.
but you bring me to a good point i have been preparing...
for the proofing of italics (and other styling) we have a need
-- an _overwhelming_ need -- for tools to help us do better.
the current methodology -- look at the pagescan and note
where there are italics, then check each instance in the text,
and mark it in the event that it hasn't already been marked.
and then, just to be safe, make sure that every italics word
that _is_ marked in the text is actually an italics in the scan.
whew! this is tedious, back-breaking work. fatiguing.
so, as you might expect, i've got some ideas for a tool.
but i'm gonna let y'all think about it for a while first...
so see if you can come up with anything, and then i'll dish...
-bowerbird