i've analyzed the 159 diffs on gardner's books,
and placed them into the following categories:
60 comma/period diffs
33 quotemark diffs
29 stealth-scanno diffs
12 various-punctuation diffs
12 formatting diffs
09 real-spelling-question diffs
04 should-have-been-preprocessed diffs
----------
159 total
note that i haven't resolved the diffs yet, i've just
classified them, so we can understand their nature.
***
more than anything else, these finds reflect the fact
that when archive.org does o.c.r. on a google scanset,
the results are not as impressive as they might be...
the o.c.r. attained by archive.org on its own scansets
is _much_ better.
although the comma/period problem is a common one,
the high number of confusions in this text is strange...
ditto with the quotemark diffs. (but, for the record, i
could've detected many of those with my quote-check,
but i didn't bother to run it, since i had gardner's text.)
i don't know if the google scans need to be despeckled,
or if the contrast is bad on them, or what the problem is,
and since google probably wouldn't improve them for us
anyway, even if we knew the reason and asked 'em to fix it,
it's probably not worth figuring out. or maybe it would be,
if there was something that we could do to fix it ourselves.
lucky for me, i don't have much investment in any book,
and if i did have one, i would be willing to scan it myself,
so i'm not desperate enough to deal with a google scanset;
i can afford to wait until archive.org scans the book instead.
one final note is that the number of stealth scannos is high,
at least relative to the other archive.org books i've checked.
again, this might be due to the low quality of google's scans,
but it might also be that archive.org has activated a kind of
"intelligence" that tries to _guess_ a word it cannot recognize,
and the guessing routine is bad, at least from our standpoint.
we should certainly hope and pray that's _not_ the case, since
stealth scannos are hidden icebergs to a comparison method.
you can find my breakdown of the 159 diffs into categories at:
> http://z-m-l.com/go/gardn/gardn-diffs-classes.html
-bowerbird