
Greetings! I'm new to Project Gutenberg, but for the moment at least mildly excited about the possibility of helping out. As one of the FAQs suggest, I will probably start out with a bit of "distributed proofreading" over at http://www.pgdp.net Looking forward to the possibility of actually producing texts, I'm curious about the pros and cons of scanning vs typing. My impression is that scanning offers the relatively significant bonus of at least potentially having the scanned images available along with the text for proofreading and/or posterity. Scanning involves the page flipping or cutting and aligning or sheetfeeding plus all the technical wrangling involved in OCR compared to the, well, typing involved in typing. I'm guessing a decent typist is still superior to OCR from the proofreading point of view. Is my initial preference for typing (partially as an excuse to see how high I can push my typing speed ;) enough of a reason to ignore the "added value" of having scanned pages available? Ok, so these are largely rhetorical questions I suppose, but I'd love to hear any opinions and feedback (especially on issues I seem to have overlooked and likely haven't even imagined). Thanks. --Bill Landis bill.landis@gmail.com

If you can type fast and well, go for it (IMHO). The drawbacks: * Some people do want scans for archival purposes * Scans make it possible for a second party to check your work * I can't type fast or accurately enough to make it worthwhile :) Geoff

Bill Landis wrote:
Is my initial preference for typing (partially as an excuse to see how high I can push my typing speed ;) enough of a reason to ignore the "added value" of having scanned pages available?
Some books are hard to OCR because the print is very irregular or becuase they use funny fonts like blackletter. If you want to type I guess you should ask the project managers at DP to give you one of those books. -- Marcello Perathoner webmaster@gutenberg.org

--- Bill Landis <bill.landis@gmail.com> wrote:
I'm guessing a decent typist is still superior to OCR from the proofreading point of view.
Not sure what you mean by "from the proofreading point of view." If you mean that the initial output from a typist is probably better than the initial output from OCR, it's possible, but I wouldn't put money on it. But the production focus for PG has largely shifted from human typing to the DP workflow. It seems a better use of human time to fix up what the OCR didn't get right rather than starting from scratch. Plus, allowing others to proof your work practically requires scans (assuming you don't just want it proofed by your roommate/SO/dog), so once that hard part is done the OCR is pretty straightforward. I would suggest you check the type-in team at DP, which primarily works on projects where the OCR basically failed. You can work on those projects a page at a time in P1 and see how much P2 has to correct further down the line, then decide whether you want to produce an entire text by typing it in yourself. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

Hi Bill. Thanks for the question. As others have mentioned, yes the error-checking tools that have been developed by PG volunteers do focus mostly on flagging OCR-type errors. However, don't let that stop you. If you have not done so yet, you may want to start browsing through the PG faq: http://www.gutenberg.org/faq/ I find that typing out a whole book does give you a more "holistic" view of the book itself. These days, most items being added to PG have come through Distributed Proofers, but not all. Typing out a full book is a decent-sized undertaking. (The first time you try it, it generally feels like an impossible undertaking. This gets easier after you've done a few books this way.) I would strongly suggest that you spend two weeks proofing individual pages at DP before you try a whole book on your own. This will give you a taste of the "issues you have not imagined yet". Also, there a support network of volunteers around, so don't be afraid to ask questions. Andrew On Wed, 28 Sep 2005, Bill Landis wrote:
Greetings!
I'm new to Project Gutenberg, but for the moment at least mildly excited about the possibility of helping out. As one of the FAQs suggest, I will probably start out with a bit of "distributed proofreading" over at http://www.pgdp.net
Looking forward to the possibility of actually producing texts, I'm curious about the pros and cons of scanning vs typing. My impression is that scanning offers the relatively significant bonus of at least potentially having the scanned images available along with the text for proofreading and/or posterity. Scanning involves the page flipping or cutting and aligning or sheetfeeding plus all the technical wrangling involved in OCR compared to the, well, typing involved in typing. I'm guessing a decent typist is still superior to OCR from the proofreading point of view.
Is my initial preference for typing (partially as an excuse to see how high I can push my typing speed ;) enough of a reason to ignore the "added value" of having scanned pages available?
Ok, so these are largely rhetorical questions I suppose, but I'd love to hear any opinions and feedback (especially on issues I seem to have overlooked and likely haven't even imagined).
Thanks.
--Bill Landis bill.landis@gmail.com _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
participants (5)
-
Andrew Sly
-
Bill Landis
-
Geoff Horton
-
Jon Niehof
-
Marcello Perathoner