
james said:
I started doing the page-at-a-time thing and gave up. Your pages are already better than mine because I used Tesseract and archive.org uses ABBY Fine Reader.
except the o.c.r. from archive.org, in this case, is screwed up. it's missing its em-dashes. i've dealt with this problem before, and it's less work to re-do the o.c.r. than to fix the em-dashes. will someone with a good version of abbyy please re-do this o.c.r.?
2). This book really requires a way to enter UTF-8 characters.
if someone does the o.c.r. for you, they can specify utf8 output...
If I could just stick a circumflex above a's, u's, and i's (both lower and upper case) that would be 99% of what I need.
if you can pull out a list of the words that require circumflexes, we can create a script that does a global change in one swoop.
(after de-hyphenating
do not dehyphenate! the program will do that for you.
re-wrapping
do not rewrap! if you need to rewrap, the program can do it. rewrapping is evil. it just makes it harder for the next guy... -bowerbird