well, it's kinda silly for us to talk about this in generalities, because
i'm capable of demonstrating what i mean specifically and exactly...

and when i'm done, i'm quite sure that you will agree completely.
(and you might say that's what you meant all along, for all i know.)

so let's get specific.

i've posted a file of text taken from rfrank's roundless experiment:
>   http://z-m-l.com/go/campf/campf-001.zml

as you can see, this book is called "pemrose lorry, camp fire girl"...

the book is now in-progress, so it's half-done and half-undone...
(it looks like it's been proofed up through about page 80 so far, but
since the pages go out multiple times, it's difficult to be certain of it,
but after page 80, it's fairly clear that the pages haven't been edited.)

that first version had the page-separators fixed, and other basic stuff.

the next version has more stuff cleaned up, but it's still rather basic:
>   http://z-m-l.com/go/campf/campf-002.zml

as you can easily see, if you look through either one of the versions,
starting after page 80, there are a bunch of paragraphing problems.

specifically, there are often blank lines inserted -- incorrectly --
between the lines of a single paragraph.  you can find instances of
this problem on pages __, __, __, and __.  it's a not uncommon glitch.

paragraphing is one of the first things i try to correct, because it's
necessary to have the paragraphs correct to fix any spacey quotes.

fortunately, it's rather easy to locate these bad paragraphs via search.
find a blank line followed by a line that starts with a lowercase letter.
(in other words, you search for two newlines followed by lowercase.)

now, one way of doing the search would be to automatically replace
_all_ of these occurrences by simply deleting one of the two newlines.
you could click "replace all" on that find-and-replace, and change all
of the occurrences without even looking at them.  you _could_ do that.
the proper changes would almost always outnumber improper ones.

whenever anyone talks about "taking the human out of the equation",
blind global changes is what immediately springs into my little brain.

and i am definitely _not_ a fan of blind global changes.

there are some changes you can make blindly, but they are few and
far between, and they certainly don't characterize the usual process.

and this change -- deleting those excess lines -- i don't do blind...

i will step through the changes one-by-one, and look at all of them
_against_the_scan_.  (even though, in most cases, i wouldn't need to
look at the scan, because it's pretty obvious how to fix the problem.)

sometimes i even delete the excess line manually, rather than merely
approve the find-and-replace, just to keep my grubby fingers busy...

but -- here's the very important part -- i do a lot _more_ than that.

at every one of these spots to which i'm automatically transported,
i look around the immediate problem area, to see if it spilled over.

and, if it did, then i will clean up that neighboring area right away.

so i'm walking a fine line here between doing a specialized search
and a general correction routine.  i start with the specialized check,
so i have a laser focus on the nature of what the check will show me,
so i don't have to waste mental energy figuring out what the glitch is.
but then i also use my peripheral vision to see what else needs fixing.

and this particular check in this particular text is a very good example.

because it ends up that a _lot_ of the surrounding areas had problems.

(that's not uncommon with these excess blank lines, because it's often
the other glitches that _caused_ the o.c.r. its paragraphing difficulties.)

there were a good number of problem-lines caught by this routine --
about 85 if i remember correctly -- but i probably corrected the text
on 85 surrounding lines as well, because i saw bugs, so i fixed them.

i encourage people to "play along at home" and _do_ this search on
this text, so you can really see how you can spot neighboring glitches.

what you will _not_ know, however, unless you are also looking at the
_scan_ for each page, is that there were often _entire_words_missing_
from the o.c.r.  they were cut off entirely from the left-hand margin.
this isn't the kind of thing for which you can do a search to find them.
they're just missing from the o.c.r., and you don't know they're gone.

however, since you are paying attention during this _other_ search,
you can become aware of these missing words, via peripheral vision.

if you _do_ "play along at home", and actually step through that check,
you will also see that there's yet another check that needs to be done
to catch _all_ of these paragraphing problems from excess blank lines.

this second check is the "flip side" of the first one.  the first one was
looking for supposedly-first-lines of paragraphs that were lowercase.
the second one checks the _termination_ of supposedly-last-lines...
so you're checking for a letter or a comma at the _end_ of a line which
is followed by a blank line.  if it were _really_ the end of a paragraph,
as is implied by that blank line, it should be punctuation-terminated.

so you're looking now for a letter/comma followed by two newlines...

this search turns up roughly 50 instances in this particular text-file.

and, once again, when you make these corrections, you look around.
roughly half of the 50 instances were cases where the blank line that
was deleted was followed itself by a line beginning with a single-dash,
which was a misrecognition for an _em-dash_ at the start of that line.
(the em-dash is represented by a double-dash.)  so those corrections
were made at the same time that the excess blank line was eliminated.

if you were a party-pooper, and didn't play along at home, i have
uploaded a copy of the file where the lines that were flagged by the
first page are indicated by an "@" sign, which you can search for:
>   http://z-m-l.com/go/campf/campf-003.zml

and here's one with the lines from the second check tagged with "%":
>   http://z-m-l.com/go/campf/campf-004.zml

you can also view the book page-by-page, as usual.  for instance:
>   http://z-m-l.com/go/campf/campfp123.html

(you will see, right there on page 123, an example of a page which
has some of the words cut off from the right-margin.  when i find
this type of problem in a scanset, it just makes me want to scream!
when you scan a book, you need to make sure you do it _carefully_.
sloppy work such as this is tremendously uncharacteristic of rfrank.)

at any rate...

it's this "looking around and fixing other errors in the neighborhood"
that makes this whole preprocessing thing such a _vibrant_ procedure.

...which brings me squarely back to the major point of this post...

***

so yeah, i'm doing a computerized check, and it's darn _efficient_, but
i'm surely not "taking humans out of the process".  not by a long shot.
i am a mentally alert human, who is _actively_ engaged with the text...
my cursor is _flying_ all around that document.  i move at warp speed.

and i am _efficient_ too.  i am _tremedously_ efficient.  i am rocking!
all the other carpenters are using hammers, and i have a _nail-gun_.

part of the efficiency is due to that _focus_ that i just talked about...
but most is simply because i am being directly transported to errors.
i don't waste any of my time _looking_ for them; they're _presented_.
i don't waste any time positioning my cursor -- it is _prepositioned_.
and the scan is right there, ready for me to look at it, immediately...

and it's _fun_.  downright _exhilarating_.  like driving a sports car.
(whereas word-by-word proofing is akin to pushing a baby stroller.)

and the results are _better_.  not just a little bit better, a lot better.
i've shown time after time after time that i can find (lots of) errors
proofers _miss_, not just in one round, but two and three and four.

and, just to repeat the message yet again, it's _simple_ to do this.
a couple dozen _simple_ checks finds a high percentage of errors.

so yeah, i know i am explaining all of this with accurate language.

but people always seem to misunderstand.

on the one hand, they think the tool is making all the changes...

on the other hand, they think you need a human to find errors...

both of those positions are wrong, and when i argue against one,
people jump to the other, and then i have to argue against _that_.

i know that i am capable of walking the tight-rope between them,
but everyone else seems to get tangled up in the dialog semantics.

the tool can find almost _all_ the errors, so you don't need to look.
(you're welcome to look, and i'm sure that you'll find a few errors,
but at some point, you have to ask whether it was worth your time.)

on the other hand, _you_ have to be the active agent that mentally
makes the decision whether or not each change should be made...

like i said, there are only a few global changes that i make blindly.

(to give you a for-instance, i will globally delete any space that
follows any doublequote located at the very beginning of a line,
or any space preceding a doublequote at the very end of a line.
i'll also blindly change spacey-quotes if the open/close pairing
within the rest of the paragraph makes the change unambiguous.
i'll also close up contractions blindly.  maybe a few more things.
but for the most part, i really want to look at the changes i make.
maybe if i had the luxury of knowing i'd have human volunteers
following up with a word-by-word proofing of everything i did,
i would be more willing to make blind changes.  or maybe not,
because that just seems wasteful of donated time and energy.)

now, i'm sure that dkretz will confirm what i've been saying here,
as his experience with his tool has led him to the same thoughts.

and maybe carel will say that this is exactly what she has meant,
and we've been doing a semantic dance around the same thought.

-bowerbird