
I have developed programs to help me proof faster/better. I work mainly in French but they seem to work well in other latin alphabet languages (I tried them a little in English, Spanish). http://www.pgdp.net/phpBB2/viewtopic.php?p=158673#158673 (get in touch with me if you want to give them a try; the CVS-commited version is not the very latest one) I use them to do R1/R2, P1/P2, and, as of recently, P0 that is to say quick preparation of OCR'd texts before publication on PGDP Int'l. I define language-related things (constants, suffixes, prefixes). Right now, apparently being the only user and developer of these programs, there are many special cases for French. But it could be easy to add things for other languages. As an example a French rule is: the word is accepted if it starts with "j'" and continues with a vowel and the rest is an accepted word. For example: "j'aime" (I love) is accepted because "aime" (love) is. "j'arbre" (I tree) is accepted because "arbre" (tree) is. This means nothing of course, but a proofer is bound to spot that: it is not a scanno (and not likely to happen in OCR anyway). Kicking some grammatical checks in would be the next step. Right now the programs are just working on a syntactical basis. I have a list of French words with all their possible grammatical natures (noun / adjective / conjugated verb for this tense and this person...) but unfortunately it was published by ABU under a restrictive license which makes it difficult for me to repackage and reuse. The free list of words I found in Debian packages is very incomplete (it is missing many simple passé simple conjugated verbs, most if not all subjonctif imparfaits...) In English we could for example decide "<word>'s" is accepted if "<word>" is (and does not finish with an "s"). I am planning to think and develop or reuse things to do PM later on, probably focusing more or less on producing XML TEI.