
james said:
One thing I spotted was the use of
i don't need details. just show me the utf8. :+)
However, I thought you should be aware that your word extraction program is doing this and it is wrong.
i know it's wrong. that's the whole point. don't explain. show me the right version.
I have a thought on looking up the words. PDFs and DjVus from archive.org have text contained in them. I should be able to put in a questionable word from the right column and see what it should be on the left, then fix it.
you're a programmer, right? start thinking like one. i don't know exactly what you mean by "put in a questionable word", but it sounds uncomfortably _manual_. ditto with doing "find" in a .pdf or .djvu. you have a list of the bad words in a file. and you have the actual e-book in a file. with pagenumbers pointing to the scans. so... think like a programmer, and write code that _automates_ the process for you, so you just have to click a button or two and maybe -- in the extreme case -- edit text in a text-field by using your (ick) keyboard. think like a programmer. i _will_ repeat this. if i _have_to_ repeat it. but james, i don't want to have to repeat it... -bowerbird