Error rate statistics

As I previously discussed, these are the figures from a project I've been working on for several years. It's a massive, three-volume job, for print publication. After each round, described below, I would reprint the text, verify the correction of the previous round's errors, and then do another round. I also did a batch-verify of ALL previous rounds' corrections, finding one or two that I had missed along the way. Round Type Errors 1/1a p/a 944 2 p 415 3 a 454 4&5 p/a 154 sc 35-40 6 rb 170 7 sr 0 8 rb 64 "9" rb 0 Explanation: Round = round, Type = type of reading: p/roofreading, a/ttestation, s/pell c/heck, r/ead b/ack using voice synthesis. I keyed most of the text, apart from a small sample (about 15-20pp) which I OCRd near the end of the text-entry phase. This made it imperative that I not only proofread the typescript in the conventional sense, but also attest it, compare it back to the original, para by para, line by line, word by word. There were many errors that I introduced to the text that would pass spellcheck or proofread; they weren't "errors", but they weren't faithful to the text, either. They had to be exterminated Some - many, actually - of the errors were native to the original. As I keyed the text, I retained them, but corrected them afterwards. Thus, the error stats are somewhat inflated, in that a good number of them, probably 10 or 12 percent, weren't my fault. Rounds 1 and 1a, my first attestation and proof, I did on the same copy, so I couldn't do separate stats. After rounds 4-5, I did a spellcheck, which returned about 35-40 spelling errors which my eyes hadn't caught. This was a bit of a shock to my own esteem of my proofing skills, so I went out and got some speech synthesis software to do readbacks. I'd clip a few hundred words at a time, and follow in the original, highlighting discrepancies as I went along. 7 was a skimread of the whole thing. 8 was a second full readback. I know I did a third full readback, but didn't seem to keep stats on it. "9" was a partial readback. At 64 errors in round 8, that works out to about one discrepency every 15 pages or 4500 words. I did a bunch of batches of 15 pages and 4500 words, and also did a complete readback of several of the most error-prone sections of the book. Even with the long breaks I took in between rounds, round "9", with no moments of sheer "d'uh" to break up the monotony, was where the law of diminishing returns kicked in. I re-did perhaps 15% of the entire text without finding any further errors. At that point, I estimated the number of remaining typos or text discrepencies in the entire book to be somewhere between 6 and 20, and I'll be damned if I'm going to spend another three months of evenings hunting the buggers down. (At the same time, in my second readback pass, I at times would go 100 pages without finding ANY errors, then hit three or four on the same page.) The total number of native typos, my typos, and my transcription errors, worked out to about 2 per 300-word page. Not great, but not bad. It was, probably, actually higher, but in my early eyeball rounds, if I came across an error that I thought I had repeatedly made, I would do a global search, attest, and replace on it when I did the corrections at the end of that round. However, I only caught under 50% by eye on my first round, and fewer than 90% by eye, overall, on subsequent rounds. About 12%, I would not have caught at all, but for speech synthesis and spellcheck.
participants (1)
-
Wallace J.McLean