On 12/15/05, Joshua Hutchinson <joshua@hutchinson.net> wrote:
Robert Cicconetti wrote:

> Generally, proofraiding (or the new PC term, harvesting) refers to
> grabbing page images (and optionally text, but it's usually
> mediocre-to-poor raw OCR). Grabbing only text is seldom worth it..
> nothing to compare it against.
>
> Blind format conversions are discouraged unless you have access to the
> original book. And as you say, clearing a text-only is more difficult.
> IIRC, it requires access to a paper copy and doing a fairly lengthy
> comparison.. only worthwhile IMO if the text is very clean and/or OCRs
> particularly poorly. (To be honest, I had forgotten this option when I
> wrote the previous post).


Sorry to contradict you again, Robert, but not only do we do
proofraiding (and proofraiding refers to harvesting pre-existing text,
imageraiding or harvesting traditionally refers to grabbing pre-existing
images ... I've done lots and lots of both) ... not only do we do it,
I'm in the middle of a proofraid right now.  In the last 2 months I've
post about 10 books so far from the Baha'i Reference Library.  Other
than format conversion and running GutCheck, I haven't gone much else
with them.

Shrug. Okay, so I'm wrong again. I was told the term 'proofraiding' was discouraged because it is not PC.. not because it refers to a different form of using preexisting resources. It certainly was used to describe grabbing previously scanned images, not text.

On another note, how do you clear these texts without a physical copy (or page images) and the relatively lengthy comparison described here:

http://www.gutenberg.org/howto/cconfirm-howto

R C