james said:
> I started doing the page-at-a-time thing and gave up.
> Your pages are already better than mine because
> I used Tesseract and archive.org uses ABBY Fine Reader.
except the o.c.r. from archive.org, in this case, is screwed up.
it's missing its em-dashes. i've dealt with this problem before,
and it's less work to re-do the o.c.r. than to fix the em-dashes.
will someone with a good version of abbyy please re-do this o.c.r.?
> 2). This book really requires a way to enter UTF-8 characters.
if someone does the o.c.r. for you, they can specify utf8 output...
> If I could just stick a circumflex above a's, u's, and i's
> (both lower and upper case) that would be 99% of what I need.
if you can pull out a list of the words that require circumflexes,
we can create a script that does a global change in one swoop.
> (after de-hyphenating
do not dehyphenate! the program will do that for you.
> re-wrapping
do not rewrap! if you need to rewrap, the program can do it.
rewrapping is evil. it just makes it harder for the next guy...
-bowerbird