
On Fri, May 13, 2005 at 04:38:31PM -0400, Geoff Horton wrote:
I will go back and look at the source, though I'm not a C expert by any stretch.
Well, the point is that it uses a three-word, not two-word, phrasebook where possible, and (what I forgot was not in that version of the source) clues from the sentence structure and "nearby" words.
But as I said in the forums, my disappointment once I got the current scheme going was that OCR quality has improved so much, it's not as effective as it would have been 10 years ago. However, I will notch your interest up as another vote for me to finish it. :-)
Please do. I think the better OCR makes the problem worse, not better, because it makes the signal to noise ratio (viewing errors as the signal, which admittedly is weird) so low that it's really, really easy to see what _should_ be there rather than what actually as. Is. :)
Ah! So you're a fan of Pauline's "assisi"! :-) jim