Re: [gutvol-d] book of james -- 001

james said:
Bowerbird, If you check the link I gave you you'll see what I'm up against.
i see it... it's not an easy book, but it's not a hard one, either, not if you accept that some parts are impossible. :+) i've already set up its online skeleton:
i can make it so you can work on it online. or offline, if you prefer, makes no difference. online would let other people work with you, if you'd be interested in having that happen... (if you want to work on it online, by yourself, just don't tell anyone where you've located it.) what i will need from you, as soon as possible, is a version of your text-file with the pagebreaks marked, so i can splice the text into the skeleton. a simple line of equal-signs will do the job fine... this is the last line of page 97... ========================= and this is the first line of page 98... again, as soon as possible, so i can do it right away, so things will be ready for you shortly after christmas. -bowerbird

Bowerbird, The page breaks are long gone from my file. When I say I'm working on it a page at a time I mean that I have a small laptop that I put next to my desktop computer. I look at the PDF on that while I edit on the desktop. For charts I've been printing out the pages. There are page numbers in the original text file from archive.org. They usually start with a left square bracket, but not always as the OCR messes some of them up. I do have the book as 1 text page per file. I got it this way by downloading the page images, making TIFFs out of them, then running tesseract on them. The results of this process are generally good, but in this case the text file provided by archive.org was a lot better so that is what I chose to use as my starting point. I can give you the separate text files in a Zip archive if you wish. My work method is to use guiguts to remove page numbers and reformat paragraphs first. Then I go in with Jedit to do corrections, then another pass with guiguts to run gutcheck, etc. Then I do the HTML conversion. The link you gave gives me a 404 error. I'm not sure what you mean by online. I thought you would provide a command line utility that would convert ZML to the various formats. Were you thinking of something like DP uses? James Simmons On Tue, Dec 20, 2011 at 4:28 PM, <Bowerbird@aol.com> wrote:
james said:
Bowerbird, If you check the link I gave you you'll see what I'm up against.
i see it... it's not an easy book, but it's not a hard one, either, not if you accept that some parts are impossible. :+)
i've already set up its online skeleton:
i can make it so you can work on it online.
or offline, if you prefer, makes no difference.
online would let other people work with you, if you'd be interested in having that happen...
(if you want to work on it online, by yourself, just don't tell anyone where you've located it.)
what i will need from you, as soon as possible, is a version of your text-file with the pagebreaks marked, so i can splice the text into the skeleton. a simple line of equal-signs will do the job fine...
this is the last line of page 97...
=========================
and this is the first line of page 98...
again, as soon as possible, so i can do it right away, so things will be ready for you shortly after christmas.
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

"James" == James Simmons <nicestep@gmail.com> writes:
James> There are page numbers in the original text file from James> archive.org. They usually start with a left square James> bracket, but not always as the OCR messes some of them up. James> I do have the book as 1 text page per file. I got it this James> way by downloading the page images, making TIFFs out of James> them, then running tesseract on them. The results of this James> process are generally good, but in this case the text file James> provided by archive.org was a lot better so that is what I James> chose to use as my starting point. I can give you the James> separate text files in a Zip archive if you wish. You seem to be unaware that you can get the text for a single page from the Internet Archive djvu files, through the djvutxt command and the -page option. Or get any range of pages, separated by FormFeed characters. The *_djvu.txt files provided at TIA are just these same files with some non-printing characters transformed into something else (in pure z.m.l. style, e.g. FormFeed is replaced by three blank lines....) thus making recovering the page breaks more difficult. Carlo Traverso
participants (3)
-
Bowerbird@aol.com
-
James Simmons
-
traverso@posso.dm.unipi.it