
jon said:
Anyway, I think you mentioned that "consistency" error before
no, this is another one. it's a place where you've deviated from the text on the scan. you _might_ say you've done it for reasons of "consistency"; however, there is no annotation to that effect in the file, as i believe you _would_ put, if that were really the case...
I guess what you really meant to say is that "I will gladly submit my errata immediately after I am paid." <laugh/>
yep, that's it. :+) although, if you haven't found the errors i know about by august -- that's 6 months after you put the file up -- i'll submit my error-free version to project gutenberg. at no cost. :+) (but, since there _is_ a serious point here, let me address it: what you are advocating is a system that is quite well-done, but which is _extremely_ expensive. indeed, it's _so_ costly that it simply _can_not_be_justified_ on a cost-benefit basis, especially since many promised benefits will not materialize. i'm just driving that point home to you. from the standpoint of "every text must be perfect", it might be worth it to _you_ to pay $50 to attain that perfection. but to end-users? no way. they'd rather keep that $50, and live with that one little error -- since no one has even managed to notice it so far anyway. as for source metadata, users wouldn't pay _5_ bucks for that.)
And of course, I won't tell you here exactly what errors the proofers found
do you think i can't pinpoint them myself? (and here's a hint back for you: those weren't the ones.)
What you are implying by this statement is that *you know* what the error-free text is supposed to be.
i do. because i have the scans to refer to. :+)
Interesting that you apparently missed at least a couple errors. <laugh/>
what makes you think i missed those errors? :+) all i said was that i was sitting on 2 errors. i did not say i was sitting on 2 and only 2. i never show all my cards if i don't need to. i only expect to get paid the $50 for the _last_ error. (although if you wait too long, i'll soon be considering a pricetag of $25 for the _next-to-last_ one as well...)
You did bring up a good point about the line break issue. This leads to my answer to Pauline's question on why I didn't use DP, and why I'm hesitant now to submit it to DP:
just run the scans through o.c.r. again, and keep the linebreaks. that's what i did. that gives you an independent work-product, which is _really_ what you need to have to find all the errors.
it would be a waste to have DP do it all over again when it is now in very good shape.
except it might save you 75 bucks... :+)
If anyone reading this wants to help and proof a few pages using the "primitive" system I now have (e.g., just print out the pages from the page scans you would like to proof, and compare them with the online XHTML version which shows the page numbering and breaks. There are other ways to do this proofing by comparison, such as opening up two windows in your browser, one showing the page scan and the other the online XHTML text.)
it is an abuse of volunteers to make them endure processes like that...
As noted above, I would use a DP process for "mass production", but for this particular project did not for the reasons cited above.
well, if you run it through abbyy finereader v7.x, the output is clean. this book would fly through d.p. -- i'd be surprised if it lasted over 3 hours in the queue, because the scans are clear, the text clean, and the book easy and interesting -- and you'd have a separate product, which is worth gold. so you're just not thinking very clearly here...
"My Antonia" is, and has always been, a demonstration project
i guess i'm still not sure what it is you're "demonstrating" with it. you say it is a "proof of concept", but what exactly is the concept? this book is dirt-simple -- among the simplest possible 10%, i'd say. and the markup you did pales against that routinely coming from d.p. i don't see anything but ordinary-and-mundane in what you've done. (i sincerely don't mean anything derogatory in saying that, because the job you did is competent, and that is all that is really required. i just don't see anything over and above simple competence here. but if i'm missing something, please do feel free to enlighten me.)
to experiment with new ideas
i don't see any.
to get my hands "dirty" with the production process (although I've transcribed a dozen texts before),
but you've just told us that you used a one-time-only process here. (good thing, too, because, as we agree, your process was primitive.) and basically, you borrowed an e-text that someone else produced! that's a good trick, when you can do it. but it won't scale very far. and it can hardly be considered to be "proof of concept".
and to use it for showing to some people interested in this.
ok, well i hope it "worked"... there's one born every minute...
This will be a tough test for your "we-don't-need-DP" approach:
i never said anything close to that. i think d.p. is swell. many people digitize books individually and independently, so it certainly can be done. given the right tools, it can even be easy. nonetheless, i think cooperation is dandy... but as for your test, bring it on. if you have an old book, with hard-to-o.c.r. text, i highly recommend that you buy the abbyy version specialized for old, hard-to-o.c.r. books. without that, you won't know how successful o.c.r. can be.
Maybe we should do a competition (you seem to love competition!):
actually, i believe competition is so 20th-century... but i love to be challenged. and i love to expose the hype people try to spin. so bring on your test. but the way i'll work it is this: you put it through d.p., and then i'll see if i can find any errors they didn't find. if i do, i'll figure out what i'll charge you to reveal them. :+) (if _i_ were you, i'd use the money to buy finereader instead.)
Of course, you will have to promise not to do late hours hand proofing of the text -- that'd be cheating -- you have to do it all auto-magically as you've been advocating.
right. like i'm gonna spend a lot of my time volunteering for _you_. i take on your challenges because i can smash them so effortlessly... -bowerbird