
Bowerbird wrote:
since you don't seem to want to have the o.c.r. done, for understandable reasons, i will do it myself.
Great! Hopefully others here will run it through their favorite OCR program and share the results with you and with gutvol-d. Please!, others, OCR the scans, which are available at: http://www.openreader.org/myantonia/
by the way, i have done some extensive comparisons of the project gutenberg version of "my antonia" and yours. the more deeply i go into it, the more i become convinced most differences are due to intentional edits, and _not_ due to sloppiness in the original preparation of the work.
How do we know? We don't know what source edition was used for PG's version of "My Antonia", but I now believe (but cannot prove until someone does the actual comparison) that the source was the "mangled" British edition, as noted below. So, the way to know for sure is to secure a copy of that "mangled" British edition and do the comparison. (Which I won't do because it is futile because the British edition is itself unacceptable.)
so this appears to be exactly like the "frankenstein" case -- a simple use of a different edition as the source-text.
Yes, and this is why I called the PG version of "My Antonia" "mangled", because it is -- it is based on a mangled British edition which Willa Cather herself was very unhappy about regarding the sloppy editing and printing. She was very "painstaking" with regards to her books -- more than the average author (and she had the status to dictate the editing and typography of her books to her publisher -- most lesser authors didn't have this luxury.) Again, my focus on the problems with the PG collection go beyond the error rates from some source -- it goes to the general aspects of trust and using the proper (acceptable) editions as source, to properly identify the source, and to provide means for easier verification the etext faithfully conforms to the source (primarily making the scans available, which is now possible -- I agree with you things were tougher a few years ago vis-a-vis providing page scans online.) For example, if NetWorker's analysis is correct (posted to The eBook Community), it now appears that the edition used for PG's version of "Frankenstein" is based on a 1981 Bantam Classics Edition, which did significant editing of the text (in essence, creating a convenient "fingerprint"), and which NetWorker (who was an attorney at one time, I believe) surmises may border on a copyright infringement (and not just a "sweat of the brow" sort of thing.) Hopefully Bantam will not catch wind of this -- but if they do, they probably won't do anything anyway. Nevertheless, one wonders how many other earlier PG texts, where there's no source information given, were derived from post-1923 emended editions? Could those ebook publishers who today use PG texts be potentially liable because of the lack of source information and a means to verify provenance? Even if the title page of a Work was photocopied and sent to PG for copyright clearance, how do we know that the person did not then use an easy-to-obtain and available modern edition for the actual scanning -- and simply photocopied the title page from a non-circulating, non-scannable copy of the rarer original edition? I believe most of those individuals who submitted etexts to PG's collection did it faithfully and followed common sense rules and expectations with regards to sources ---> But *how do we know*, and *how can we know*? We can't -- there's no mechanism to verify these things. This is where having the full source information, and having all the page scans of the source and making them available, builds trust in (and protects from copyright infringement claims) the particular etext and the associated collection it belongs to. It is also the morally right thing to do.
in view of the insinuations you cast against the "accuracy" of the project gutenberg e-text, perhaps you should apologize?
Why? The differences in the PG edition of "My Antonia" likely came from a mangled British edition which Willa Cather apparently was upset about. These changes are, in essence, errors. In addition, we have no idea as to what emendments may have been made to the first and subsequent PG etext editions since (until possibly now) we didn't know what edition was used as the original source! You certainly don't have access to the edition used to generate the PG edition of "My Antonia", do you? If not, then *how do you know* it is accurate to some original source edition? We can't talk about what is an error and what is not an error when we don't have the source information, and better yet page scans to immediately verify. That's why Michael Hart's interest in "correcting" the errors in the non-DP portion of the PG corpus is beyond futile and will not build trust in the collection -- how can one reliably correct an etext when the original source is not known/available to consult with? It's ludicrous, and a complete waste of time. It's better to redo the etexts via DP where the source info is recorded and page scans are (hopefully) available, as well as having the proofing done by a number of independent proofers, rather than just one person. Multiple, independent proofers adds trust to the process, in addition to having the source info and scans available. After all, intentional misspellings are common in many books (e.g., "My Antonia", Mark Twain's books, etc. -- and many pre-19th century books use variant spellings since rigorous spelling was not then an established norm) so how does one know if an "error" is really an error? And there are errors which cannot be caught by simple reading or even programs, such as missing (or added) accented characters, wrong punctuation (such as replacing an em-dash with a colon), and wrong paragraph breaks. (Most of which we see in "My Antonia".) Many of these "not discernable" errors can sometimes tweak the meaning of the etexts. We owe readers, even the casual readers, an excellent product with full disclosure. For example, the poll I'm conducting on this topic at The eBook Community indicates (but not proves -- consider this a preliminary assessment) that a significant percentage of those who read public domain digital texts *prefer* (note carefully this word) the texts they use to come from acceptable, known editions, and be faithful renditions of those editions. This only makes common sense. To dismiss this is essentially saying that the vast majority people don't give a damn about whether the public domain texts they spend hours and hours of their valuable time reading are reasonably faithful to the original. Does anyone want to make that claim that the vast majority of people (99% as it seems like PG's online info says) don't care one whit? And trying to prove that claim by pointing to the large number of people using PG texts, is not proof since I believe most people have innocent blind faith that PG did things correctly. Furthermore, anyone doing a major effort in delivering the public domain to the public has a moral responsibility to do it correctly and to state in sufficient detail the provenance and any edits of the texts. If it is a heavily emended text, then it should be specified to the public with sufficient detail *in that etext, not elsewhere* so the reader *knows* a text they are reading has been emended (one doesn't have to list the edits item by item, but it should be made clear the text has been substantially edited and to give a general overview of the types of edits done.) I've explained this on TeBC in more detail. This is a *responsibility*, which places restrictions on how PG and similar groups should conduct themselves. This is a serious endeavor: digitally transfering and preserving the public domain. This is not child's play. It is true that the Public Domain exists for anyone to do anything with it as they see fit, but like any freedom, there are associated responsibilities. Full disclosure is one of them, and is a common sense responsibility. Trying to be faithful in transcribing texts is another one when no disclaimers are given in the texts themselves since people assume the texts they are reading a reasonably faithful to the original. Jon