Re: [gutvol-d] Heebee Jeebees on Gutenberg

13 May 2005

      ...
Geoff, I find your approach fascinating because I usually think
of error-catching in posted texts as a smoothreading sort of
task
That's how I do it when I'm working for PG (or PGDP). But I get
frustrated when gutcheck finds stuff I should have caught, and that
also causes me to worry that I've missed something else of the same
sort.

My original plan (which is probably what put me onto this idea in the
first place) was inspired by a he/be finding program mentioned in the
PGDP forum. That led me to wonder whether I could let a work proof
itself (so to speak) by building a list of all pairs of words in the
book and then looking at the ones which appeared only once or twice.
Unfortunately, applying this to a text I just post-processed (_The
King's Achievement_, not yet posted to PG) resulted in 61,920 pairs of
words--46,354 of which appeared only once. So much for that theory. I
could cut down the totals a bit by being smarter about the end of
sentences, but I don't think it would make enough difference to make
the idea workable.

The key to the original program, and to what I was doing playing with
the archives, is coming up with the phrases that will find the
problem. For example, just searching for "clone" turns up 107 hits,
many of which are legit. Searching for "be clone" turns up 19,
including repeats, none of which are legit. So I think the basic idea
has merit, but darned if I know how to move it into a more practical
stage.

Geoff

Re: [gutvol-d] Heebee Jeebees on Gutenberg

Geoff Horton