
james said:
I would guess this is not the work flow you had in mind.
um, no, it's not... :+) and so i just spent a big whole bunch of hours for no good reason. but hey, it's my own fault. :+) i broke my very own hard-and-fast rule about being explicit and demanding (and demandingly explicit) on chain-of-custody with a collaborative document. (it's actually the #1 cause of headaches in that arena. it's even worse than an asshole boss, because it can make you hate friends. and i "supposedly" know it. and yet i proceeded without being explicit enough. my bad.) *** anyway... my first reaction is just to post the work i've done, and say "if it helps you, fine, go ahead and use it, and if it doesn't, you're no worse off than before," and leave it at that. and i will post the work, no doubt about it. but i feel compelled to give you a little bit of advice, as well, whether you asked for it or not. first... recently, i said this:
rewrapping is evil. it just makes it harder for the next guy...
unfortunately, in your case, james, "the next guy" is _you_. as far as i can tell, you have rewrapped the entire file; however, you still must _proof_ it. rewrapping makes that proofing 27 times harder. second... unless you decide that em-dashes are "immaterial", restoring the missing em-dashes alone will _double_ the time it takes you to clean this text. i kid you not. you really need to re-do the o.c.r. to fix that problem. you _could_ grab the em-dashes from the abbyy.gz file available on the archive.org website, but you will _still_ be missing the utf8 stuff, and that's a huge deal. using the "compose" tip makes that go faster, yes, but you'll save gobs of time if you don't have to do that job. em-dashes and utf8 -- alone -- mean that it is simply a bad investment of your time to continue on your path. that's my experience. and it was hard-won. *** like i said, i'm probably gonna just walk away from this. even so, i advise you to start anew. if you feel like you _must_ retain the 80 pages you've already "fixed", fine. but you still have 350 pages more to go, and if you do 'em inefficiently, you're gonna be wasting a _lot_ of your time. and even the 80 pages that you've already "done" are not -- as far as i can tell -- dependably solid. i did a check for space-doublequote-space, to see if i would find any, and i did. if you missed stuff that is _that_ easy to spot, you probably missed a lot of other more subtle stuff too. even good proofers miss 10%-20% the first time through. so you really need someone to do a second proofing, but with rewrapped lines, it will be _very_ hard to find anyone. (and realistically, with a text this difficult, you might well need three proofings, or even four, before it's really solid.) so anyway, that's my feedback, solicited or not... :+) -bowerbird

Bowerbird, OK, I want to give this a chance. How about when you have something ready to go I'll take my work and copy/paste portions of it that are more or less the same as the pages in your interface. So the first 100 pages or so won't be a perfect match to the page images, but it will be close, say to the nearest paragraph. The rest I'll do in your tool. The end result should be pretty close to what I'd have if I did it with your tool to begin with. It would be very difficult to use your tool as you intend all the way through because of all the ASCII art family trees I did, which no longer really match the format in the book in many cases. I have done page at a time proofing in the past and would have done it this time except my own OCR was bad and I didn't know the trick of getting individual pages from DJVU's. I agree if your scans are better that what I have now, and it certainly sounds that way, it will certainly save me time to do it your way as much as possible, and of course I am very interested in the ZML conversion, etc. I hope we're on the same page now. Thanks, James Simmons On Fri, Dec 23, 2011 at 2:49 PM, <Bowerbird@aol.com> wrote:
james said:
I would guess this is not the work flow you had in mind.
um, no, it's not... :+)
and so i just spent a big whole bunch of hours for no good reason.
but hey, it's my own fault. :+)
i broke my very own hard-and-fast rule about being explicit and demanding (and demandingly explicit) on chain-of-custody with a collaborative document.
(it's actually the #1 cause of headaches in that arena. it's even worse than an asshole boss, because it can make you hate friends. and i "supposedly" know it. and yet i proceeded without being explicit enough. my bad.)
***
anyway...
my first reaction is just to post the work i've done, and say "if it helps you, fine, go ahead and use it, and if it doesn't, you're no worse off than before," and leave it at that.
and i will post the work, no doubt about it.
but i feel compelled to give you a little bit of advice, as well, whether you asked for it or not.
first...
recently, i said this:
rewrapping is evil. it just makes it harder for the next guy...
unfortunately, in your case, james, "the next guy" is _you_. as far as i can tell, you have rewrapped the entire file; however, you still must _proof_ it. rewrapping makes that proofing 27 times harder.
second...
unless you decide that em-dashes are "immaterial", restoring the missing em-dashes alone will _double_ the time it takes you to clean this text. i kid you not. you really need to re-do the o.c.r. to fix that problem.
you _could_ grab the em-dashes from the abbyy.gz file available on the archive.org website, but you will _still_ be missing the utf8 stuff, and that's a huge deal. using the "compose" tip makes that go faster, yes, but you'll save gobs of time if you don't have to do that job.
em-dashes and utf8 -- alone -- mean that it is simply a bad investment of your time to continue on your path.
that's my experience. and it was hard-won.
***
like i said, i'm probably gonna just walk away from this.
even so, i advise you to start anew. if you feel like you _must_ retain the 80 pages you've already "fixed", fine.
but you still have 350 pages more to go, and if you do 'em inefficiently, you're gonna be wasting a _lot_ of your time.
and even the 80 pages that you've already "done" are not -- as far as i can tell -- dependably solid. i did a check for space-doublequote-space, to see if i would find any, and i did. if you missed stuff that is _that_ easy to spot, you probably missed a lot of other more subtle stuff too. even good proofers miss 10%-20% the first time through.
so you really need someone to do a second proofing, but with rewrapped lines, it will be _very_ hard to find anyone.
(and realistically, with a text this difficult, you might well need three proofings, or even four, before it's really solid.)
so anyway, that's my feedback, solicited or not... :+)
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (2)
-
Bowerbird@aol.com
-
James Simmons