Re: [gutvol-d] have a nice weekend

16 Sep 2006

      ...
...
...
...
...
"Bowerbird" == Bowerbird  <Bowerbird@aol.com> writes:
Bowerbird> it's a discussion over at distributed proofreaders
    Bowerbird> about repurposing digitizations found elsewhere on the
    Bowerbird> web into the d.p. workflow, jumpstarting the proofing
    Bowerbird> process with a text that has already received a good
    Bowerbird> amount of proofing.  the catch?  the other
    Bowerbird> digitizations have linebreaks removed, making proofing
    Bowerbird> more difficult for d.p. people...

Too easy to solve: OCR the images, preserving line breaks, add to
every end-of-line a character not otherwise appearing much, e.g. @,
run wdiff between the two versions, replace [-@-] with a linebreak,
remove the other differences with a regexp. You might miss some
linebreaks, if the OCR is very bad. But a better regexp might help in
this case.

Carlo

Re: [gutvol-d] have a nice weekend

Carlo Traverso