Re: [gutvol-d] book of james -- 016

james said:
One thing I spotted was the use of
i don't need details. just show me the utf8. :+)
However, I thought you should be aware that your word extraction program is doing this and it is wrong.
i know it's wrong. that's the whole point. don't explain. show me the right version.
I have a thought on looking up the words. PDFs and DjVus from archive.org have text contained in them. I should be able to put in a questionable word from the right column and see what it should be on the left, then fix it.
you're a programmer, right? start thinking like one. i don't know exactly what you mean by "put in a questionable word", but it sounds uncomfortably _manual_. ditto with doing "find" in a .pdf or .djvu. you have a list of the bad words in a file. and you have the actual e-book in a file. with pagenumbers pointing to the scans. so... think like a programmer, and write code that _automates_ the process for you, so you just have to click a button or two and maybe -- in the extreme case -- edit text in a text-field by using your (ick) keyboard. think like a programmer. i _will_ repeat this. if i _have_to_ repeat it. but james, i don't want to have to repeat it... -bowerbird

Bowerbird, The URLs ending with .py are not Python programs. The "dummy" file does seem to be doing the "compose+'+S" correctly, and I do have to change the browser to UTF-8 to see it. If you want me to "think like a programmer" then at a minimum I'd want to have: 1). The file containing diacritics that you used to extract the words. 2). The Python program(s) you used to extract them and make this list. 3). The Python program you wrote to make the word substitutions and put the diacritics back in. I do have your file with misspelled words by page. I will attempt to use it. Many of the words in the diacritical file are familiar enough to me that I won't need to look them up. My only concern is the "Compose+'+s" characters. It will be really easy for me to correct all those with JEdit. I just wonder how they got that way to begin with and if what you are working on could deal with the corrected text. James Simmons On Thu, Jan 19, 2012 at 4:42 PM, <Bowerbird@aol.com> wrote:
james said:
One thing I spotted was the use of
i don't need details. just show me the utf8. :+)
However, I thought you should be aware that your word extraction program is doing this and it is wrong.
i know it's wrong. that's the whole point. don't explain. show me the right version.
I have a thought on looking up the words. PDFs and DjVus from archive.org have text contained in them. I should be able to put in a questionable word from the right column and see what it should be on the left, then fix it.
you're a programmer, right?
start thinking like one.
i don't know exactly what you mean by "put in a questionable word", but it sounds uncomfortably _manual_.
ditto with doing "find" in a .pdf or .djvu.
you have a list of the bad words in a file. and you have the actual e-book in a file. with pagenumbers pointing to the scans.
so...
think like a programmer, and write code that _automates_ the process for you, so you just have to click a button or two and maybe -- in the extreme case -- edit text in a text-field by using your (ick) keyboard.
think like a programmer.
i _will_ repeat this. if i _have_to_ repeat it.
but james, i don't want to have to repeat it...
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (2)
-
Bowerbird@aol.com
-
James Simmons