
Somewhere in the recent spate of discussion, several contributors, including BB and/or Jim Adcock I think, described diffing two different (OCRed) editions of the same work, and also finding many scannos. These appends seem to me to imply that diffing parallel/independently produced texts is a faster and more efficient way to correct the OCR process than sequential manual checking. I wonder what happens if, starting from the same physical text, you do one or more of: - scanning with two different scanners - OCRing with two different algorithms/programs - manual correction (PPing?) independently by two different people And then diffing the results. Has anyone (possibly including DP) ever tried any of the above, and documented the results in a scientifically valid way? Does DP work that way anyway? It sort of makes sense to me that if the above processes are basically 95%-99% accurate, then automatically comparing two independent results to find errors might be a lot more reliable than manually refining something that's already so good that humans can't see the difference. It's not quite the same, but I've spent enough of my life trying to see the errors in program code, i.e. read what the code actually says, rather than what I think it says, to know that humans have amazingly good subconscious error correction algorithms which it's impossible to turn off. Bob Gibbins