bill said:
> Ok, so these are largely rhetorical questions I suppose,
> but I'd love to hear any opinions and feedback
> (especially on issues I seem to have overlooked
> and likely haven't even imagined).
bill, i love you. enthusiasm is so cool.
many people out there in the world might say that
typing-in isn't cool. "too much work," they'll say...
bashalam. it's not "work" at all, it's "fun",
and don't you forget it. what a joy it is to
interact with a text as closely as one does
when one retypes every single word of it!
it becomes a part of your d.n.a. seriously!
you become one of the bookpeople from f451.
so if you really _love_ a particular book,
there's nothing better than typing it in...
and there's no problem with the accuracy,
no problem at all, if you're a perfect typist.
oh. wait. you're _not_ a "perfect" typist?
you're just "extremely accurate"? or at least
"highly competent"? or "pretty good" anyway?
well. gee. that's good. but it's not, um, you know...
"perfect".
and we really do want the e-texts to be perfect.
yep. at least as perfect as we can make them.
so what you need is a way to check your accuracy.
to improve it from whatever it is _up_ to perfection.
one route is to give it to distributed proofreaders. really!
they'll proof what you have typed, like it was an o.c.r. job,
using the scans as their guide. that means you have to
do the scans, but -- compared to typing a whole book in --
that will go pretty fast, believe me. just take a few hours.
and you have to have the book to re-type it all in, right?
so while you've got the book, you might as well scan it.
don't have your own scanner? someone you know does.
3-in-1-machines -- printer/fax/scanner -- are now common.
but of course, if you're doing the scans, you might as well
do the o.c.r. on them too; that only takes a few _minutes_.
ok, maybe 57 minutes, which is really an hour. but still,
it probably took you a few days to type the thing in, so...
plus, if you don't have an o.c.r. app, d.p. will do it for you.
ok, so, you've typed the book in, scanned it, and done o.c.r.
now you're swimming in text. the magic happens, though,
because your typed-in text and your o.c.r. text were derived
_independently_. that's very important, because it means
their errors are likely to be _independent_, or _orthogonal_.
(that means totally unrelated.)
and that independence means those two text files can be
productively used to cross-check each other for accuracy!
the type of errors that you will make in re-typing in the book
will likely be quite different than the errors a scanner makes.
humans usually slip up on _semantics_, or mixing up letters.
(you might mistype "their" as "thier", or "thye" for "they".)
on the other hand, o.c.r. confuses letters that _look_ similar.
(so it will mistake an "h" and a "b", or an "rn" for an "m".)
but any error on either side of the lot means a _discrepancy_
between one text and the other text. so a simple _compare_
process will pin-point the location of that difference, meaning
you will be able to look at the p-book -- or the scanned page --
and immediately determine from the source which of the two
text versions is the right one, and which is the wrong one, and
you will be able to correct the error in the one which is wrong...
there might be the very rare case where your re-typing makes
the _exact_ same error as the scanner made, in which case the
comparison process will slide right by, and you won't be alerted
to that spot to make corrections, and the errors will stay there.
but -- as i said above -- that will be "the very rare case" indeed...
while you might find it somewhat difficult to do the comparison,
since some of those tools are still very primitive, i _assure_ you
that you will get some _very_ accurate text out of the process.
so accurate that i would say you don't even need to submit it
to distributed proofreaders to have it proofed by their wise eyes.
that wouldn't _hurt_, of course, and if the book is interesting,
it would give some of the people there a special book to proof,
which is always a treat for them, so you can do that if you want.
but a combination of scanning and re-typing is enough by itself.
and as i said, it's really a _remarkable_ way to engage with a book.
re-typing will give you an _intimate_ knowledge of the book which
you are totally unable to get any other way. highly recommended!
-bowerbird