Re: [gutvol-d] Language free version of guiguts?

19 Jan 2006

      I have developed programs to help me proof faster/better. I work mainly
in French but they seem to work well in other latin alphabet languages
(I tried them a little in English, Spanish).

http://www.pgdp.net/phpBB2/viewtopic.php?p=158673#158673
(get in touch with me if you want to give them a try; the
CVS-commited version is not the very latest one)

I use them to do R1/R2, P1/P2, and, as of recently, P0 that is to say
quick preparation of OCR'd texts before publication on PGDP Int'l.

I define language-related things (constants, suffixes, prefixes). Right
now, apparently being the only user and developer of these programs,
there are many special cases for French. But it could be easy to add
things for other languages.

As an example a French rule is: the word is accepted if it starts with
"j'" and continues with a vowel and the rest is an accepted word.

For example: "j'aime" (I love) is accepted because "aime" (love) is.
"j'arbre" (I tree) is accepted because "arbre" (tree) is. This means
nothing of course, but a proofer is bound to spot that: it is not a
scanno (and not likely to happen in OCR anyway).

Kicking some grammatical checks in would be the next step. Right now the
programs are just working on a syntactical basis. I have a list of
French words with all their possible grammatical natures (noun /
adjective / conjugated verb for this tense and this person...) but
unfortunately it was published by ABU under a restrictive license which
makes it difficult for me to repackage and reuse. The free list of words
I found in Debian packages is very incomplete (it is missing many simple
passé simple conjugated verbs, most if not all subjonctif imparfaits...)

In English we could for example decide "<word>'s" is accepted if
"<word>" is (and does not finish with an "s").

I am planning to think and develop or reuse things to do PM later on,
probably focusing more or less on producing XML TEI.

Re: [gutvol-d] Language free version of guiguts?

Sebastien Blondeel