
Jeroen Hellingman (Mailing List Account) wrote:
I wanted to do this for Dutch in the pre-1947 orthography, but it is a tremendous lot of work, so have not yet completed this work. I've collected about 100 megabytes of text in this orthography, and made an initial word list out of it, discarding anything that appears less than five times. Then have to match this to a modern word list, to fill in the gaps, then have to go through the entire list again to add all regular (grammatical) variants of each word, and filter out unwanted words, such as common misspellings and scarce words matching with common scannos or typos.
It would be awfully nice if it would be possible to have lexical data included. There are plenty of dictionary files available, but none of them include lexical data. Regards, Walter