
oh, this is rich.
it's a discussion over at distributed proofreaders about repurposing digitizations found elsewhere on the web into the d.p. workflow, jumpstarting the proofing process with a text that has already received a good amount of proofing. the catch? the other digitizations have linebreaks removed, making proofing more difficult for d.p. people... kind of ironic, isn't it? anyway, have a nice weekend! :+) -bowerbird

"Bowerbird" == Bowerbird <Bowerbird@aol.com> writes:
Bowerbird> it's a discussion over at distributed proofreaders Bowerbird> about repurposing digitizations found elsewhere on the Bowerbird> web into the d.p. workflow, jumpstarting the proofing Bowerbird> process with a text that has already received a good Bowerbird> amount of proofing. the catch? the other Bowerbird> digitizations have linebreaks removed, making proofing Bowerbird> more difficult for d.p. people... Too easy to solve: OCR the images, preserving line breaks, add to every end-of-line a character not otherwise appearing much, e.g. @, run wdiff between the two versions, replace [-@-] with a linebreak, remove the other differences with a regexp. You might miss some linebreaks, if the OCR is very bad. But a better regexp might help in this case. Carlo
participants (2)
-
Bowerbird@aol.com
-
Carlo Traverso