Re: [gutvol-d] Double blind OCRing?

28 Sep 2012


      On Fri, Sep 28, 2012 at 8:33 AM, James Adcock <jimad@msn.com> wrote:
...
...
These appends seem to me to imply that diffing parallel/independently
produced texts is a faster and more efficient way to correct the OCR process
than sequential manual checking.
Independent scans and OCR with independent programs is not fast or
efficient. Nor can simple hours be counted; the whole point of DP and
similar projects is that it's easier to get many hours from many
people then to get one volunteer to put in fewer hours.
...
And the results are more accurate.  The DP approach does not lead to
particularly accurate texts.
Right, whatever. Instead of sneering, how about some actual evidence
and numbers. Show me texts, and let's see what you consider "not
particularly accurate".
...
PPPS: And neither Unicode nor HTML give us good tools to transcribe what one
actually finds in historical books in the first place.
Unicode doesn't? Whatever. Unless you're talking manuscripts and EETS
books, it's pretty solid.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: [gutvol-d] Double blind OCRing?

David Starner