james said:
> The URLs ending with .py are not Python programs.
well, technically, a python program lives at that address,
but what _you+ will see, in your browser-window, will be
the output which that python program is sending to you.
> The "dummy" file does seem to be
> doing the "compose+'+S" correctly
ok, that's interesting, on some level...
but you'll still need to fix the 777 file.
> and I do have to change the browser to UTF-8 to see it.
i suspect that, since it's labeled as a .txt file,
it's being sent out as ascii, rather than utf8...
probably some switch i can toggle at my i.s.p.
> If you want me to "think like a programmer"
> then at a minimum I'd want to have:
i want you to think like a d.i.y. programmer
who cannot require things "at a minimum"... :+)
> 1). The file containing diacritics
> that you used to extract the words.
there is no such file. i read in your text-file,
and extracted any word containing a diacritic.
for the record, there's under a dozen diacritics.
> 2). The Python program(s) you used
> to extract them and make this list.
i wrote that program in basic, not python, so
you'll need to settle for english pseudo-code,
which is what i just told you, back at point 1).
> 3). The Python program you wrote
> to make the word substitutions
> and put the diacritics back in.
that wasn't a python program, not originally,
just a macro of global changes in a text-editor.
easy enough to do it in basic, though, or python,
which is what i did to make "waxon" and "waxoff".
off the top, so untested and maybe even wrong:
> s=thebookfile
> swi=switchfile.split("\n")
> for x in range (0,len(swi)):
> dan=swi[x].split(" -- ")
> dia=dan[0].strip()
> non=dan[1].strip()
> s=re.sub(non,dia,s)
c'mon, james, that's a 7-line routine. you can d.i.y.
> I do have your file with misspelled words by page.
that list is now outdated.
> Many of the words in the diacritical file are familiar enough
> to me that I won't need to look them up.
but some are not, i take it.
won't hurt to check _all_ of 'em against a scan,
especially if you just check a few occurrences...
that routine might run about 25 lines, james.
do you want me to write that one for you too?
> My only concern is the "Compose+'+s" characters.
all you need to do is enter it correctly in the file you edit.
> I just wonder how they got that way to begin with
well, we know it happened in my hands. so who knows?
it could have happened any number of ways, all of which
are now immaterial. don't get hung up on the irrelevant.
just take the "777" .html file, edit it appropriately, and
then re-mount it somewhere only where i can get to it...
-bowerbird