Bowerbird,

I have noticed the lack of em-dashes.  They are painful to put back in.

If I had UTF-8 output that would be even better.  I understand the DP guys swear by ABBY Fine Reader and I can see why they would but I'm an open source kind of guy and Tesseract worked really well on most of my earlier books.  I would have submitted page images to DP for this book if not for the fact that I want the book to be finished while I'm still alive.  I've submitted stuff to both DP and DP Canada over a year ago and it's still working its way through the queues.  I've got three Raymond Chandler novels and two Robert C. Benchley humor collections at DP Canada now.

I want to make it clear that I'm going to finish this book as I started it, by fixing up the archive.org text file.  I only offered it up as an example of a really hard book to do.  If you can make your approach work on this it will work on anything.  I will help with this in any way I can short of starting the whole process over again.  If I can do half the new way and half the old I'll do it.

James Simmons


On Wed, Dec 21, 2011 at 1:37 PM, <Bowerbird@aol.com> wrote:
james said:
>   I started doing the page-at-a-time thing and gave up.
>   Your pages are already better than mine because
>   I used Tesseract and archive.org uses ABBY Fine Reader.

except the o.c.r. from archive.org, in this case, is screwed up.
it's missing its em-dashes.  i've dealt with this problem before,
and it's less work to re-do the o.c.r. than to fix the em-dashes.

will someone with a good version of abbyy please re-do this o.c.r.?


>   2).  This book really requires a way to enter UTF-8 characters.

if someone does the o.c.r. for you, they can specify utf8 output...


>   If I could just stick a circumflex above a's, u's, and i's
>   (both lower and upper case) that would be 99% of what I need.

if you can pull out a list of the words that require circumflexes,
we can create a script that does a global change in one swoop.


>   (after de-hyphenating

do not dehyphenate!  the program will do that for you.


>   re-wrapping

do not rewrap!  if you need to rewrap, the program can do it.

rewrapping is evil.  it just makes it harder for the next guy...

-bowerbird

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d