Bowerbird,

The page breaks are long gone from my file.  When I say I'm working on it a page at a time I mean that I have a small laptop that I put next to my desktop computer.  I look at the PDF on that while I edit on the desktop.  For charts I've been printing out the pages.

There are page numbers in the original text file from archive.org.  They usually start with a left square bracket, but not always as the OCR messes some of them up.

I do have the book as 1 text page per file.  I got it this way by downloading the page images, making TIFFs out of them, then running tesseract on them.  The results of this process are generally good, but in this case the text file provided by archive.org was a lot better so that is what I chose to use as my starting point.  I can give you the separate text files in a Zip archive if you wish.

My work method is to use guiguts to remove page numbers and reformat paragraphs first.  Then I go in with Jedit to do corrections, then another pass with guiguts to run gutcheck, etc.  Then I do the HTML conversion.

The link you gave gives me a 404 error.

I'm not sure what you mean by online.  I thought you would provide a command line utility that would convert ZML to the various formats.  Were you thinking of something like DP uses?

James Simmons


On Tue, Dec 20, 2011 at 4:28 PM, <Bowerbird@aol.com> wrote:
james said:
>   Bowerbird,
>   If you check the link I gave you
>   you'll see what I'm up against.

i see it...  it's not an easy book, but
it's not a hard one, either, not if you
accept that some parts are impossible.       :+)

i've already set up its online skeleton:

>   http://z-m-l.com/go/bhaga/bhagap123p.html

i can make it so you can work on it online.

or offline, if you prefer, makes no difference.

online would let other people work with you,
if you'd be interested in having that happen...

(if you want to work on it online, by yourself,
just don't tell anyone where you've located it.)

what i will need from you, as soon as possible,
is a version of your text-file with the pagebreaks
marked, so i can splice the text into the skeleton.
a simple line of equal-signs will do the job fine...


this is the last line of page 97...

=========================

and this is the first line of page 98...

again, as soon as possible, so i can do it right away,
so things will be ready for you shortly after christmas.

-bowerbird

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d