
Robert Cicconetti wrote:
Generally, proofraiding (or the new PC term, harvesting) refers to grabbing page images (and optionally text, but it's usually mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. nothing to compare it against.
Blind format conversions are discouraged unless you have access to the original book. And as you say, clearing a text-only is more difficult. IIRC, it requires access to a paper copy and doing a fairly lengthy comparison.. only worthwhile IMO if the text is very clean and/or OCRs particularly poorly. (To be honest, I had forgotten this option when I wrote the previous post).
Sorry to contradict you again, Robert, but not only do we do proofraiding (and proofraiding refers to harvesting pre-existing text, imageraiding or harvesting traditionally refers to grabbing pre-existing images ... I've done lots and lots of both) ... not only do we do it, I'm in the middle of a proofraid right now. In the last 2 months I've post about 10 books so far from the Baha'i Reference Library. Other than format conversion and running GutCheck, I haven't gone much else with them. Josh