
roger said:
The DHYP button dehyphenates the page so spellchecking has whole words to check.
minor question, for roger... why not have the spellchecker join the hyphenates, solely to do its checking, and leave the text intact? *** major questions, for everyone... can someone give me a 2012 justification for the reductionistic attitude toward the proofing task? why are we still doling out pages one-at-a-time? in 2002, i understand why d.p. was conceptualized in such a fashion... o.c.r. wasn't nearly as reliable, so there were corrections necessary on every page, and each one might demand a chunk of your time. but today, you can clean an entire book in an hour -- in one pass, or in six 10-minute sessions, say -- and send it to the smoothreaders for a final check... so why don't you guys want people to do it that way? smoothreaders are lots easier to recruit from scratch. i've been beating this drum for a long time now, and maybe my own noise has deafened me to hear you, so i'm seriously asking you for an answer and i will promise to do my best to hear what you have to say, because i honestly believe there _is_ no justification. so if you believe you have one, can i hear it please? *** for my side, i scraped roger's text and images, and have already cleaned the book and placed it online, as if for smoothreaders. i didn't even bother to do a final spellcheck, or an exhaustive _italics_ search, so those are two tasks you could do if you want to... i'm quite confident this meets my standard criterion, which is a rate of 1 error or less in every 10 pages... by the way, if you wanted to clean roger's text as well, i've appended a list of questionable things to examine. some of 'em are undoubtedly right, but i'd look at 'em anyway, if i were to seriously take the job of this book. (which is to say i didn't actually examine all of this list.) and there were other errors i'd already fixed before i generated this list, so it won't be an exhaustive list... anyway, here you go...
http://z-m-l.com/go/betle/betlep123.html http://z-m-l.com/go/betle/betle.zml
-bowerbird p.s. list of easily-found possible-errors in "betty lee"...
%ut *Y Adeste Ah's Ain And'me Avalking afteh agis ar.e arivals artiele atternoon audtorium Bam Barnett Baruett Bradshaw Buxton bing blackeyed bootleggeh boydom butleh C Carlin Caroline Chanucey Chauneey Chetr Chistmas Christinas Cie Crede cheks cher childf couse cousip crede D Dat Dey Dodie Dorrance Dorranee Dorry de dem dere's dictu dinneh doin doublequick drew'her drummajor E Einma Eosy est evidentally Fascisti Ferris Fidelis Finn fatted flyin foh G Galilee Geewhilikers Glaus Gr Grood Gwynne git goin goodhumoredly goodnight guestspolitely gwine H'm Hallowe'en Hm Howland Huxley's Huxleys he'worked hehse'f high'spirits hise'f horribile hurrried hurryf hurryin ing io ioas ivere Josie jus Kose LeEoy Lena LeRoy Lovel Lyon"Y leftiit longue loquor M Mary Maytime Mm Monday mawnin mayn't me.v mihi miserabile mon muddly mustn't mysteriouser Nup neded nevah nosense nuffin O O.K. Onee Ophelia Orme o ol on's operam orchestra"'next oughtn't Papa's pail."I'm palazso palazzo peccato playin prosecomp proudj puritanic pusson quaveringly Rardon r re Savilia Savilla's Savillas Sehna Selma Sevilla's Sevillas sauve schooltime searchingiy selfconscious semed serius sewin shound siicceeded slangily slep somthing spect spuzzy suah Titania Tommy t t'ook taMng tempora than,the that.would the'best thore tion tiredlooking tnat togeteher tol trubble ubi under,his unecessary vero victrola vocabularly Well,/remember Willie wanta was'glad welldressed weren wlio woodchopper's wouldn wouldn'd xnay Y Yes'm year's yo'se'f yonr
-Laura? as-of Bim-bang car-fare? co-operate direc-i distinguished-looking E-r-r-boom easy-going entered,-her faculty-versusstudent G-od G-ood G-wynne How-de-do Howdo-you-do have-something how-do-you-do hymn-book ice-cream inter-plays it-go junior-boys keen-looking Lu-chee-a leave-takings Mid-years mid-years non-sorority One-two-three-go pos-i-tive-ly read-Cicero room-and school-girl semi-order senior-girls Tooral-looral-loo-oo-oo-oo Toot-toot Trade-Last tear-stained termed-social think-a thought-Father tooral-looral tooral-looral-loo-oo-oo-oo tr-ragic try-out twelve-thirty What's-his-name Ye-ah

On Dec 23, 2011, at 2:32 PM, Bowerbird@aol.com wrote:
roger said:
The DHYP button dehyphenates the page so spellchecking has whole words to check.
minor question, for roger...
why not have the spellchecker join the hyphenates, solely to do its checking, and leave the text intact?
There are several reasons of varying quality. 1. habit. it's the way I've done it for years and it just looks right. 2. It doesn't require me to use the -~ markup and not using that, I can take the output of the editing tool and pass it directly to gutcheck (or equivalent). 3. I can't have hyphens resolved over a page break in this page-at-a-time implementation, at least not without code that I don't want to write. 4. I see no pure solution in leaving hyphens intact, since I move other things around (such as images) anyway.
major questions, for everyone...
can someone give me a 2012 justification for the reductionistic attitude toward the proofing task? why are we still doling out pages one-at-a-time?
There's a lot to be said for working at the book level instead of the page level. If this were a program being run locally on a user's own machine, I would do many things differently. One reason I do it this way the online version is that presenting the text to the user one page at a time fits on the computer's screen. I also feel it fits comfortably with users that want to just get one page right. There's a sense of accomplishment resolving the warnings on one page and knowing it is "done." I do it page-at-a-time because this allows someone who isn't prepared to take on a whole book to still contribute. From my experience in working with everyday people who want to be a part of this, encouraging successful little steps is a good thing.
but today, you can clean an entire book in an hour -- in one pass, or in six 10-minute sessions, say -- and send it to the smoothreaders for a final check...
so why don't you guys want people to do it that way? smoothreaders are lots easier to recruit from scratch.
I don't know about doing this book in an hour. Though this was a low-density text, it had quite a few errors. I've just finished going through it using essentially the same tool announced earlier at http://etext.ws/ppe.php. It took me more time, perhaps two hours. I suspect the difference is that there are some corrections that can be made globally and those are the ones that take more time when done on a page-by-page basis.
for my side, i scraped roger's text and images, and have already cleaned the book and placed it online, as if for smoothreaders.
Great! Thanks, BB. I'm excited to compare what you've done to what I did with the same source text. I trust you scraped and used the OCR version of each page's text. I'll report back what I find. --Roger

why are we still doling out pages one-at-a-time?
in 2002, i understand why dp was conceptualized
in such a fashion... ocr wasn't nearly as reliable,
so there were corrections necessary on every page,
and each one might demand a chunk of your time.
but today, you can clean an entire book in an hour
-- in one pass, or in six 10-minute sessions, say --
and send it to the smoothreaders for a final check...
I've scanned and reviewed the proofing on a lot of books, all ABBYY-OCR'ed, and one error per 10 pages exhibits good book selection Abbyy management skills I wish I had, or maybe books with 12 lines per page. No question it shouldn't take five more more passes, plus post-processing measured in days rather than hours. At least all the auto-detectable stuff should be done before a proofer ever sees it; and any remaining clues should be visible to the proofer (like Roger is doing.) I'm pretty sure we could do better work, quicker, if we proofed in parallel, then compared (and merged) the results, As long as we have serial proofers, we can't measure either techniques or (worse) people. The best way to get past the tedium is to come up with an objective means for comparing results derived from different, and differences in, the process. Until then we're stuck with our prejudices.

On Dec 23, 2011, at 2:32 PM, Bowerbird@aol.com wrote:
for my side, i scraped roger's text and images, and have already cleaned the book and placed it online, as if for smoothreaders. i didn't even bother to do a final spellcheck, or an exhaustive _italics_ search, so those are two tasks you could do if you want to...
i'm quite confident this meets my standard criterion, which is a rate of 1 error or less in every 10 pages...
Well, I've finished my comparison. I am impressed at BB's hour's work as posted in his version of Betty. I found only 22 errors that were right in mine and incorrect in his, excluding missed italics fromt the count. That result is better than his standard criterion. Very good work. I don't know yet what things might be wrong in both until I run gutcheck and some other tests. None of the "errors" I found were by gutcheck--they were only by running a straight compare of BB's and my files. I learned a few things. First, there are several more regex checks that would have helped and I'll bake those in. I also learned that even as it stands, half-baked, the ppe editor can produce a usable text. I'll put this one into HTML for the smoothie waiting for it and start another one. --Roger Note to BB: I'll send you those 22 errors in a separate email in case seeing them may help improve your process.
participants (3)
-
Bowerbird@aol.com
-
dakretz@gmail.com
-
Roger Frank