for 32 days, i am showing samples of the problems
with the text in e-books from the internet archive...
***
today's example is from "the massacre at chicago",
thanks to gardner buchanan, who pointed it out...
here's the scan for page 49:
> http://www.archive.org/stream/waunangeeormassa00rich#page/49
here's the o.c.r. for the whole book:
> http://ia331420.us.archive.org/2/items/waunangeeormassa00rich/waunangeeormassa00rich_djvu.txt
and here's the o.c.r. for page 49:
>
> THE MASSACI^S; AT-CHJCAGO. IftV
>
> 'engaged in hostilities ‚ ¨ lias Winnebeg not reyealedthi^,?". iM ‚ ,-, ‚ ¨¢tu I. y ‚ !.;
> . ! :". Npt; a ‚ ¨¢v(fpj;d," jreplipd ; J^i^itenautliliflj^ley, i^stonislied, in liii*! tu^iij at -the
> information. ' . ‚ ![.. -^r.^::! ‚ ¨¢ ‚ ¨¢. ‚ ‚ --;; ‚ -'.'. .>' . ‚ ¨¢i : ‚ : ‚ ‚ ], ‚ .'.:;
>
> *.' At aqotbi^r .inoraent, .and on an indifferent occasion, this mutual misun-
> der^^tauJiiig ^^liglit, afford room for pleasaiitry," continued Mr. McKenzje
> Avith a grave smile ; " but if Is not so., ^'inncbeg, I ?ee, iias been trucitp
> his tru|5t; |anf],.x'yltI)Oiagh cognizant of tli 9/;n;atvfre of tb^;despatphes, reve-uled
> the info^'ipation, to no one hut myself, whom, hp regarded, as having not only
> ii, right to.^pp.'^sess it at the , Cfirliegt moment, but as, being the most pioper
> person to advise withtlie commanding officer, at the earliest moment, on the
> measures to- be -adopted. 1 am here for that purpose ; think you 1 slijdl find,
> liini alone ‚ ¨~ for I wouldn't enter upon the subject before Mrs. IJeiidley,": ‚ ¨¢,-,.
>
> "I have just said that ]\Irs,. IJeadlej and Margaret are in attendancei on
> the unfortunate Ronayne,,"' replied Elmsley.,;; ‚ 'i ;Y,pu,,^i)l,, t^herelore, be .sui-^
> ,to find him alone, aud.no doubt busied,,iu t^i^ for^g^ionjof-iplanS) of opera-
> .tipn consequent on this inteJligence.'' . , . j ,..; ,[;;., ; ; ,, ., .;. , ; , : , ,
>
> ' ‚!"¶Recollect, not a word of this untiljt is p|ficially revealed. , I shall npt
> even let Captain Headley know that I am aware of the facts, but, siimply
> state that, having heard he was iii the .r^jqeiptcsf-despatches, I had come to
> know if there was any ne\ys of irnportance» ;But, pf one ;thing I would wai'n
> jou, Elmsley ; there will be a council of war to-morrow» and I could wish
> that your view of the subject may, lead you to prefer defending, the fort to
> the last extremity ill preference tq.c), long, and uncertain -retreat tq Por[t
> Wayne, which I know is suggested in the' despatch." ' ‚ .,,;)
>
> . '* I shall have no, difficul^ty in ^rrivi^g, at that; depiijion/V^^tufni^ the
> ..officer of the guar^/f',jtor qpmhppn sjensq only, is necessary ,tp. show the
> advantages of one course over tlic other. In the meantime,, I shall evince
> .no knowledge pf what you have conveyed to me, until the hour of councih
> Did no other consideration weigh with me, I would oppose a movement
> 'which cuts us off from all hope. of restoring the de.-^r lost wife of,lvQnayae
> to Iier distracted husband." , , , : .. ‚ ¨¢ , ‚ ] ‚ ,: .1 ' ‚ :
>
> \ "Good bye, God bless you," answered the trader, as he mpyedto-yrai'd^
> the quarters of Captain Headley. ' .,,..;,.,,. ...,.', ‚ ‚ ‚ ,,' ‚ ¨¢ ‚ , I'tui
>
> "Then," mused Elmsley, when alone, "arelhe faret)oding^.pf ithat ^sty-
> old number of the I)^ational Intelligencer whicH I have thumbed for hours
> over and over again for the last three months, at length finally realized^ ‚ ¨
> and Avar is come at last ; well, be it so ! .My chief anxiety is for Margaret,
> Would that she and all the rest of the weak women in this forti-ess were safe.
> H within the fortifications of Detroit ; but all evil seems to be coming upon us
> at once." , '' ' ' " ‚ ‚ ¨¢ ,, ,;;
>
> " Ah ! Mr, McKenzie^ I laipa.^Y.ery gl^d tp seei ypu," said Captain ileadlej,
> rising as the trader entei-eii tbe room set apart for .his library; and the tr^iflSr-.
> action of military official business. ^ " 'take a se,at. ; 1 Ypu .0pul,d not have Jf^l^
> me a more opportune visit." , . ,, , ,, ,,.:, . . ' ,.. . , »,-'[|
>
> " Ihad understood that \Vinnebeg had just returned with despfatchig^
> from Detroit," remarked the trader, "and am come to learn the news.." ,,,( ,^
>
> " Bad enough," answered Capt^ Headley, gravely, as he handej.tc) (hjypa
> the despatchyfrom General Hull. "Read tliat !" ^ ‚ ¨¢.' : ',' , - ..',.'1;^ ' ‚ / ‚ ‚ ¨¢' i')
the reason for this dreadful o.c.r. is that the page has very bad bleedthrough,
where the text from the other side of each page also shows up on each scan.
it also has foxing -- brownish spots of various shapes and sizes on the page.
and indeed, most of the pages within this book showed these same problems.
gardner reports he was able to do some manipulations on this scan such that
he was able to get better o.c.r. out of it. but i wonder if that's really worth it.
in my opinion, we need to just throw out this scan-set and re-scan that book.
(and make sure that we scan another copy of the book to avoid a recurrence.)
i'll note, however, that if gardner _can_ get good o.c.r. out of a scan like this,
and if the manipulations of the scan can be formulated to the point where they
can be applied in a batch process to the full scan-set, resulting in decent o.c.r.,
that would be fabulous. of course, that will also mean we have yet _another_
version of the scan-set for this book. archive.org already has two versions --
the high-res version and the low-res "flippy" version for their online reading.
(can you imagine they have a "high-res" version of these bad-looking pages?
what in the world is _that_ good for?, except wasting a whole lotta diskspace.)
i originally suggested as a "best practices workflow" that a scanning operation
should do o.c.r. immediately on each page, and then spellcheck the results, and
-- when there was a high percentage of misrecognized words -- re-do the scan.
needless to say, such a workflow would prevent us having scans like this one...
for the most part, i will avoid presenting scans like this one in this series, since
the takeaway should be that we _can_ correct the o.c.r., in a cost-effective way,
and therefore we should pursue that path. obviously, that's untrue of this book.
thankfully, books like this are very rare in the archive.org library.
-bowerbird