Re: [gutvol-d] Heebee Jeebees on Gutenberg

13 May 2005

      ...
And so it goes. Jeebies also works for other stealth scanno pairs,
if you feed it their databases, by the way; hut / but, tom / torn,
eat / cat and so on.
I was trying/hoping to come up with a way to catch such things without
having to build the databases (Larry Wall says laziness is a
programming virtue, after all). In particular, I was (and am) looking
for a way to deal with the scannos where both words are common--the
thought of going through a text looking at each instance of "is" to
determine whether it should be "as", and vice-versa, is markedly
unappealing. I really can't see vocab lists picking that up.

I will go back and look at the source, though I'm not a C expert by any stretch.
...
But as I said in the forums, my disappointment
once I got the current scheme going was that OCR quality has
improved so much, it's not as effective as it would have been
10 years ago. However, I will notch your interest up as another
vote for me to finish it. :-)
Please do. I think the better OCR makes the problem worse, not better,
because it makes the signal to noise ratio (viewing errors as the
signal, which admittedly is weird) so low that it's really, really
easy to see what _should_ be there rather than what actually as. Is.
:)

Geoff

Re: [gutvol-d] Heebee Jeebees on Gutenberg

Geoff Horton