Re: [gutvol-d] Fwd: Cervantes Books

----- Original Message ----- From: "Robert Cicconetti" <grythumn@gmail.com>
AFAIK, we don't simply repackage existing text-only copies available on the web.
R C
Actually we do and have, Robert. Ok, DP doesn't, but PG volunteers do. We even have a name for it. Proofraiding. ;) The hard part is making sure it is well proofed and to our formatting standards and that it is clearable by our standards. Josh (JHutch at DP)

Generally, proofraiding (or the new PC term, harvesting) refers to grabbing page images (and optionally text, but it's usually mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. nothing to compare it against. Blind format conversions are discouraged unless you have access to the original book. And as you say, clearing a text-only is more difficult. IIRC, it requires access to a paper copy and doing a fairly lengthy comparison.. only worthwhile IMO if the text is very clean and/or OCRs particularly poorly. (To be honest, I had forgotten this option when I wrote the previous post). Back to the gist of my question.. Does anyone know of image archives of spanish fiction? On 12/14/05, Joshua Hutchinson <joshua@hutchinson.net> wrote:
----- Original Message ----- From: "Robert Cicconetti" < grythumn@gmail.com>
AFAIK, we don't simply repackage existing text-only copies available on the web.
R C
Actually we do and have, Robert. Ok, DP doesn't, but PG volunteers do. We even have a name for it. Proofraiding. ;)
The hard part is making sure it is well proofed and to our formatting standards and that it is clearable by our standards.
Josh (JHutch at DP) _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Robert Cicconetti wrote:
Generally, proofraiding (or the new PC term, harvesting) refers to grabbing page images (and optionally text, but it's usually mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. nothing to compare it against.
Blind format conversions are discouraged unless you have access to the original book. And as you say, clearing a text-only is more difficult. IIRC, it requires access to a paper copy and doing a fairly lengthy comparison.. only worthwhile IMO if the text is very clean and/or OCRs particularly poorly. (To be honest, I had forgotten this option when I wrote the previous post).
Sorry to contradict you again, Robert, but not only do we do proofraiding (and proofraiding refers to harvesting pre-existing text, imageraiding or harvesting traditionally refers to grabbing pre-existing images ... I've done lots and lots of both) ... not only do we do it, I'm in the middle of a proofraid right now. In the last 2 months I've post about 10 books so far from the Baha'i Reference Library. Other than format conversion and running GutCheck, I haven't gone much else with them. Josh

On 12/15/05, Joshua Hutchinson <joshua@hutchinson.net> wrote:
Robert Cicconetti wrote:
Generally, proofraiding (or the new PC term, harvesting) refers to grabbing page images (and optionally text, but it's usually mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. nothing to compare it against.
Blind format conversions are discouraged unless you have access to the original book. And as you say, clearing a text-only is more difficult. IIRC, it requires access to a paper copy and doing a fairly lengthy comparison.. only worthwhile IMO if the text is very clean and/or OCRs particularly poorly. (To be honest, I had forgotten this option when I wrote the previous post).
Sorry to contradict you again, Robert, but not only do we do proofraiding (and proofraiding refers to harvesting pre-existing text, imageraiding or harvesting traditionally refers to grabbing pre-existing images ... I've done lots and lots of both) ... not only do we do it, I'm in the middle of a proofraid right now. In the last 2 months I've post about 10 books so far from the Baha'i Reference Library. Other than format conversion and running GutCheck, I haven't gone much else with them.
Shrug. Okay, so I'm wrong again. I was told the term 'proofraiding' was discouraged because it is not PC.. not because it refers to a different form of using preexisting resources. It certainly was used to describe grabbing previously scanned images, not text. On another note, how do you clear these texts without a physical copy (or page images) and the relatively lengthy comparison described here: http://www.gutenberg.org/howto/cconfirm-howto R C

On Wed, 14 Dec 2005, Robert Cicconetti wrote:
Generally, proofraiding (or the new PC term, harvesting)
Sorry, PG has been using the term "harvesting" for at least as long as the world is been becoming PC. _I_ might have been somewhat responsible for the term raiding, as I used to call the harvesters "The Raiders of the Lost Art" ;-) Happy Holidays! Give eBooks!!! Michael S. Hart Founder Project Gutenberg
participants (3)
-
Joshua Hutchinson
-
Michael Hart
-
Robert Cicconetti