rewrapping p.g. to an existing scan-set

carlo, perhaps the answer was hidden in your last post, but let me present the situation, and ask the question directly... the project gutenberg e-text for "pride and prejudice" is relatively accurate, but completely lacking provenance... the internet archive scan-set for "pride and prejudice" is self-documenting, but the o.c.r. text from it is abysmal... even though, as you point out, they are different editions, it seems we could use these problems to offset each other. so... how would you suggest that we go about doing that? we can, of course, just send the internet archive stuff through d.p., but that wouldn't take advantage of our already-proofed-and-relatively-accurate p.g. e-text. so, carlo, what would you recommend, specifically? -bowerbird

"Bowerbird" == Bowerbird <Bowerbird@aol.com> writes:
Bowerbird> carlo, perhaps the answer was hidden in your last post, Bowerbird> but let me present the situation, and ask the question Bowerbird> directly... Bowerbird> the project gutenberg e-text for "pride and prejudice" Bowerbird> is relatively accurate, but completely lacking Bowerbird> provenance... Bowerbird> the internet archive scan-set for "pride and prejudice" Bowerbird> is self-documenting, but the o.c.r. text from it is Bowerbird> abysmal... The OCR of the 1833 edition from UofT is quite good, the OCR of the 1813 first edition is bad. Bowerbird> even though, as you point out, they are different Bowerbird> editions, it seems we could use these problems to Bowerbird> offset each other. We can use the 1813 scans to deduce that the PG edition conforms to the 1813 edition (to possibly prove the conformity in the second book, and there is no reason that if this is true it is not true for the rest). Remark, I don't say that I have proved that, I say that from a sample I am convinced of that, and I have an idea how this can be proved or disproved automatically. Bowerbird> so... how would you suggest that we go about doing Bowerbird> that? Bowerbird> we can, of course, just send the internet archive stuff Bowerbird> through d.p., but that wouldn't take advantage of our Bowerbird> already-proofed-and-relatively-accurate p.g. e-text. This is not a good idea, unless we can prove that the changes were approved by the author. But this is unlikely, see http://en.wikipedia.org/wiki/Pride_and_Prejudice#Publication_history Bowerbird> so, carlo, what would you recommend, specifically? Do nothing, until Google or TIA scan the other volumes of the 1813 edition. Then even a crappy OCR is enough to reintroduce the page numbers. If you care, you can buy for $ 3 the Dover 1995 edition, after clearing it on the basis of the copyright statement: "This Dover edition ... is an unabridged slightly corrected republication of the text of the first edition of 1813 ... A new introductory note has been specially prepared for this edition." You will get an OK with an indication to remove the modern material. If you really care, you can buy for US$ 41203.58 (plus $ 10.20 shipping) a copy of the first edition and scan it. It might turn out to be a bargain, since one half-title is missing. A complete copy costs US$ 60000.00, or 75000.00 with the spines preserved. For my strategical recommendations see my next post. Carlo
participants (2)
-
Bowerbird@aol.com
-
traverso@posso.dm.unipi.it