Error rate statistics

5 Feb 2005

      As I previously discussed, these are the figures from a project I've 
been working on for several years. It's a massive, three-volume job, 
for print publication. After each round, described below, I would 
reprint the text, verify the correction of the previous round's errors, 
and then do another round. I also did a batch-verify of ALL previous 
rounds' corrections, finding one or two that I had missed along the way.

Round	Type	Errors
1/1a	p/a	944
2	p	415
3	a	454
4&5	p/a	154
	sc	35-40
6	rb	170
7	sr	0
8	rb	64
"9"	rb	0

Explanation: Round = round, Type = type of reading: p/roofreading, 
a/ttestation, s/pell c/heck, r/ead b/ack using voice synthesis. I keyed 
most of the text, apart from a small sample (about 15-20pp) which I 
OCRd near the end of the text-entry phase. This made it imperative that 
I not only proofread the typescript in the conventional sense, but also 
attest it, compare it back to the original, para by para, line by line, 
word by word.

There were many errors that I introduced to the text that would pass 
spellcheck or proofread; they weren't "errors", but they weren't 
faithful to the text, either. They had to be exterminated

Some - many, actually - of the errors were native to the original. As I 
keyed the text, I retained them, but corrected them afterwards. Thus, 
the error stats are somewhat inflated, in that a good number of them, 
probably 10 or 12 percent, weren't my fault.

Rounds 1 and 1a, my first attestation and proof, I did on the same 
copy, so I couldn't do separate stats.

After rounds 4-5, I did a spellcheck, which returned about 35-40 
spelling errors which my eyes hadn't caught. This was a bit of a shock 
to my own esteem of my proofing skills, so I went out and got some 
speech synthesis software to do readbacks. I'd clip a few hundred words 
at a time, and follow in the original, highlighting discrepancies as I 
went along.

7 was a skimread of the whole thing.

8 was a second full readback. I know I did a third full readback, but 
didn't seem to keep stats on it.

"9" was a partial readback. At 64 errors in round 8, that works out to 
about one discrepency every 15 pages or 4500 words. I did a bunch of 
batches of 15 pages and 4500 words, and also did a complete readback of 
several of the most error-prone sections of the book. Even with the 
long breaks I took in between rounds, round "9", with no moments of 
sheer "d'uh" to break up the monotony, was where the law of diminishing 
returns kicked in. I re-did perhaps 15% of the entire text without 
finding any further errors.

At that point, I estimated the number of remaining typos or text 
discrepencies in the entire book to be somewhere between 6 and 20, and 
I'll be damned if I'm going to spend another three months of evenings 
hunting the buggers down.

(At the same time, in my second readback pass, I at times would go 100 
pages without finding ANY errors, then hit three or four on the same 
page.)

The total number of native typos, my typos, and my transcription 
errors, worked out to about 2 per 300-word page. Not great, but not 
bad. It was, probably, actually higher, but in my early eyeball rounds, 
if I came across an error that I thought I had repeatedly made, I would 
do a global search, attest, and replace on it when I did the 
corrections at the end of that round.

However, I only caught under 50% by eye on my first round, and fewer 
than 90% by eye, overall, on subsequent rounds. About 12%, I would not 
have caught at all, but for speech synthesis and spellcheck.

Wallace J.McLean

tags

participants (1)