
james said:
I do have the book as 1 text page per file.
ok. (i think.)
I got it this way by downloading the page images, making TIFFs out of them, then running tesseract
oh dear. that was a waste of time.
I can give you the separate text files in a Zip archive if you wish.
are these the files after you made your corrections? if so, then yes, those are exactly the files that i need. zip 'em up, and put it in your dropbox.
My work method is to use guiguts to remove page numbers and reformat paragraphs first.
oh dear. more wasted time. oh well. (also, removing pagenumbers is the _last_ thing to do. they help let you be aware where you are in the book.)
The link you gave gives me a 404 error.
yes, here's the correct one:
sorry about that...
I'm not sure what you mean by online.
i mean you do your corrections on the web... which means that other people can help you. (at least if you give them the web-address.) but if you prefer to work offline, you can do that.
I thought you would provide a command line utility
i'm a mac person, james. we believe in a friendly interface. only a sadist seeks to saddle you with command-line crap...
that would convert ZML to the various formats.
but first you have to get your text _into_ .zml format.
Were you thinking of something like DP uses?
"something like" that is a fairly accurate description. my system isn't nearly as convoluted or bureaucratic. -bowerbird