
Geoff, I find your approach fascinating because I usually think of error-catching in posted texts as a smoothreading sort of task
That's how I do it when I'm working for PG (or PGDP). But I get frustrated when gutcheck finds stuff I should have caught, and that also causes me to worry that I've missed something else of the same sort. My original plan (which is probably what put me onto this idea in the first place) was inspired by a he/be finding program mentioned in the PGDP forum. That led me to wonder whether I could let a work proof itself (so to speak) by building a list of all pairs of words in the book and then looking at the ones which appeared only once or twice. Unfortunately, applying this to a text I just post-processed (_The King's Achievement_, not yet posted to PG) resulted in 61,920 pairs of words--46,354 of which appeared only once. So much for that theory. I could cut down the totals a bit by being smarter about the end of sentences, but I don't think it would make enough difference to make the idea workable. The key to the original program, and to what I was doing playing with the archives, is coming up with the phrases that will find the problem. For example, just searching for "clone" turns up 107 hits, many of which are legit. Searching for "be clone" turns up 19, including repeats, none of which are legit. So I think the basic idea has merit, but darned if I know how to move it into a more practical stage. Geoff