for 32 days, i am showing samples of the problems
with the text in e-books from the internet archive...
***
oops! i'm sorry, i was very busy yesterday, so i forgot this!
today's example is again from our friend from baltimore,
edgar allen poe, this time volume 3 of his collected works.
here's the scan for page 118:
> http://www.archive.org/stream/worksofedgaralle03poee#page/118
here's the o.c.r. for the whole book:
> http://ia341337.us.archive.org/2/items/worksofedgaralle03poee/worksofedgaralle03poee_djvu.txt
and here's the o.c.r. for page 118:
>
> 118 WORKS OF EDGAR ALLAN POE
>
> a meckamcal ingenuity so much, superior to our
> own. One finds it difficult, too, to conceive the
> vast masses which these people handle so easily,
> to be as light as our own reason tell us thej^
> actually are.
>
> Api'il 8th. — Eureka! Pundit is in his glory.
> A balloon from Kanadaw spoke us to-day and
> threw on board several late papers ; they contain
> some exceedingly curious information relative
> to Kanawdian or rather Amriccan antiquities.
> You know, I presume, that laborers have for
> some months been employed in preparing the
> ground for a new fountain at Paradise, the Em-
> peror's principal pleasure garden. Paradise, it
> appears, has been, literally speaking, an island
> time out of mind — that is to say, its northern
> boundary was always (as far back as any record
> extends) a rivulet, or rather a very narrow arm
> of the sea. This arm was gradually widened
> until it attained its present breadth — a mile.
> The whole length of the island is nine miles;
> the breadth varies materially. The entire area
> (so Pundit says) was, about eight hundred years
> ago, densely packed with houses, some of them
> twenty stories high: land (for some most unac-
> countable reason) being considered as especially
> precious just in this vicinity. The disastrous
> earthquake, howev' r, of the year 2050, so totally
> uprooted and overwhelmed the town (for it was
> almost too large to be called a village) that the
> most indefatigable of our antiquarians have
> never yet been able to obtain from the site any
> sufficient data (in the shape of coins, medals or
this appears to the human eye to be a very clean page,
but we have errors where there was a mild imperfection
on the page, including "mechanical" and "they" at the top,
"april" in the next paragraph, and "however" further down.
(the "kanawdian" and "amriccan" spellings were intentional.)
and the o.c.r. was set to ignore the italics, so they're missing.
so even with a clean page, we can get some o.c.r. errors,
enough that -- when hundreds cumulate over a book --
it can take roughly one hour per book to fix all of them.
-bowerbird