
I've been mulling over ideas for applying Natural Language Processing to catch hard-to-find errors in e-texts. I have made little practical progress, but for some reason it occurred to me to try a few carefully chosen Google searches, all restricted to site:www.gutenberg.org. "around the comer" returns 17 hits. "turn the comer" returns no hits. "to he" returns 10,700 hits, a fair number of them not representing typos. "have clone", 13 hits. "will bo", 1 hit. "to bo", 38, some legit (often using "Bo" as a proper noun) "went borne" (for "went home"), 5 hits, one of which is legit and the other being four different editions of the same work, all with the same error. "fax away", 1 hit. "coining to", 23 hits, some legit. "he docs", 7, with some repeat editions. "it docs", 9, with repeats, but offset by two hits in one work. "she docs", none. I don't know what all that proves, but I found it interesting nonetheless.