The Principal Problem of Today's Public-Domain E-Books

Hello all, The (to my mind) best software for reading e-books on any platform, Marvin, www.marvinapp.com , is nearing its iPhone release, while the iPad version has also seen dramatic improvements over the recent months. I gave away all my Kindle devices to friends and family over the last few months -- Marvin's features simply blow Kindle (et al.) out of water, and there's no going back now for me. :-) Below is an essay I sent to the Marvin Beta email list earlier today, and because I discuss Project Gutenberg there, I'm forwarding a copy of that post to this list. Alex. Avenarius www.aboq.org ***** THE PRINCIPAL PROBLEM OF TODAY'S PUBLIC-DOMAIN E-BOOKS To me, the principal problem with all the "free"/public-domain e-books one finds floating on the Web nowadays, is not -- with a single crucial exception* -- their insufficient, inelegant, or amateurish formatting (such as that of headings), but the principal problem for me is the lack of proper bibliographical sourcing of the electronic editions -- or the choice of a poor paper edition as the source for producing the electronic edition. * The crucial exception is when formatting carries *meaning* -- then it must be retained faithfully. For example, if a word within a sentence is printed in italics, this can alter the meaning/tone of the entire sentence. To fail to reproduce the word printed in italics in the electronic edition, is inexcusable. Yet this is frequently a problem especially with the earliest Project Gutenberg files, which were based on plain-text, where advanced formatting was not possible. Some producers of such early editions completely skipped the reproduction of italics (inexcusable!), while others sought to reproduce italics using crude means (such as ALL CAPS) -- which is lame but at least tolerable. The problem with Project Gutenberg is that to this day (!), whenever users search for the e-book of a famous literary work, the Project Gutenberg search engine (and others) will seemingly suggest for primary download precisely one of those earliest, and therefore most fault-ridden, editions of the famous work. Why? Well, because over the years (and even decades, by now), the early, faulty edition has (sadly but logically...) accumulated the biggest number of downloads. Here one can see that the number of downloads alone does *not* equal higher quality -- in fact, in the case of Project Gutenberg, it often signals *bad* quality. Yet the general public does not realize this. And so, even though (say) _Huckleberry Finn_ may today be available in Project Gutenberg in various editions, the general reader will typically download and read the *worst* (= earliest) edition of them all. While a corrected, meticulously produced electronic edition of the same book, may frequently be ignored, "because it has so few downloads". :-( A "chicken and egg" issue, in a sense. But, apart from this exception regarding formatting (when formatting carries *meaning*), the principal issue is the lack of proper bibliographical sourcing. The only books suitable for study from the scholarly point of view, are those that retain the *exact* content of the books as they were originally published -- during the writer's lifetime, and ideally, under the author's supervision. No revisions, modernizations, etc., of the author's texts, especially performed after the writer's death, are permissible from the scholarly point of view. The only tolerable edition for scholars is one that in the German language is called "wort- und zeichengetreu", meaning: retaining every single word, and every single punctuation sign (!), *exactly* as they were in the authoritative edition released during the writer's lifetime. The issue of punctuation should not be underestimated: no other linguistic feature gets as much abused and wilfully distorted in so-called "modern" editions, as arbitrary changes to the original punctuation in a book. In the world of paper books, there is a wonderful US edition called "Library of America", which releases books based on the "wort- und zeichengetreu" principle. So, what we need in the electronic environment, are editions of the same quality, as "Library of America" has on paper. Note: LoA is now also starting to publish some of its releases in the electronic format, but they tend to be terribly expensive -- dozens of dollars for a single public-domain book; that is hardly the way to win over the general public. Rather, the producers of electronic editions of books should take care, before producing an EPUB version of a book, to select a reliable paper edition as the source, and retain it *exactly*. The problem is: finding a release of (say) _Huckleberry Finn_ from Mark Twain's lifetime, is more difficult, especially for the typical "amateur/volunteer digitizer", than just reaching for a random (but corrupted in the supposedly "modern" way) paperbook edition of the same work released in our own time. Well, but at *least* if you don't bother to select a reliable source paper edition when producing an electronic version of a book, you should *exactly* indicate in your electronic edition, which paper edition it was that served as your source. I find that this information is typically lacking in the free e-books one finds on the Internet. In fact, if the information is present, it is frequently deliberately *faked*, to escape copyright concerns. That is: a volunteer may take a present-day paper edition of _Huckleberry Finn_, produce an electronic version of it, but then, in the final electronic file, indicate that an edition from (say) 1914 had been the source -- just so that no one starts wondering about copyrights for the electronic file. Yet the wording in the electronic file may well be from 1997, despite the year "1914" stated at the beginning of the file. Bottom line: it's frequently a mess. :-( I only had a brief look at Marvin's default "Library" before deleting it (by the way, is there an option to re-download it at a later time? I can't find it), but I'm afraid the books Kris included in his sample collection, suffered from the deficiences described above: not just (perhaps) the insufficient formatting of headings and such (which are largely cosmetic issues I'd be willing to tolerate), but especially a lack of proper bibliographic sourcing of each book. Do the books included in Marvin's default "Library" clearly state on which paper editions they are based; and are these the *optimal* paper editions on which to base an electronic edition? And, do they only *state* those editions to have been the source, or has someone actually verified that the electronic text truly conforms to the text in that particular paper edition, preserving the "wort- und zeichengetreu" principle? That is, is every word, and every punctuation sign in the electronic edition, *exactly* the same as in the *optimal* source paper edition? If not, then those electronic editions are not suitable for scholarly reading. Note: I have nothing, in principle, against "modernized" editions of classics. I just would never read them myself -- but reader tastes differ, and other readers might prefer a modernized electronic edition over the original one. This can be compared to digital music: an audiophile would refuse to listen to music encoded in 128kbps MP3 files -- such files simply do not faitfhully enough preserve the full flavour of the original recording. An audiophile would probably only accept a loss-free FLAC digital file, nothing less -- no MP3 recordings of any kind. Because I'm no audiophile, I'm OK with those inferior 128kbps MP3 files. But, I *am* what you might call a "digital bibliophile" -- in that nothing except the full flavour of the original edition of a literary work, will suffice me. Just like an audiophile will only accept a FLAC recording (or equivalent), and nothing less -- in the same way, I only accept "wort- und zeichengetreu" editions of literary works, of the "Library of America" type, and nothing less. So, I have nothing against modernized editions of classics -- provided that they are *clearly labeled* as such, and provided that they are not *the only editions* offered to the reader of electronic books. Sadly, neither of these conditions is typically met nowadays. A typical public-domain e-book will either entirely forego stating its source -- the paper edition on which it is based; or it will only briefly (lacking sufficient detail) state it "pro forma", while actually being based on a different (contemporary) edition; or it will have chosen an unsuitable, *modernized* paper version of a classic as its source, without giving the reader a choice. The reader of electronic literature nowadays is typically only given a single choice: "Here's this [modernized] electronic EPUB edition of the classic -- take it or leave it!" The "modernized" part is frequently only implied, and not even stated explicitly to warn the reader who might *not* wish to read a "modernized" text. This is clearly unacceptable, from the scholarly point of view. And if I remember correctly, the e-books included in Marvin's default "Library" suffer from the ills mentioned above, too. Due to this sad state of things, I just can't switch my studies of literature to Marvin completely. For many classic literary works I'm reading, a reliable electronic editon ("wort- und zeichengetreu") simply isn't available. I, therefore, need to locate a *photographed* PDF file of the original paper edition (or photograph the book myself, using the fabulous Scanner Pro app on the iPhone), and read *that* edition, clumsily, in GoodReader. I'm simply not given any other choice! :-( For example, my no. 1 favourite writer is Leo Tolstoy. Will I be able to read his possibly most famous work, the novel _Anna Karenina_, in Marvin? No... Only in GoodReader. I read Tolstoy in Russian, but as far as I'm aware, there is not a single reliable Russian EPUB edition of the novel available online, anywhere, despite expired copyrights. There are countless *modernized* EPUB editions of _Anna Karenina_, but apparently no EPUB edition that would satisfy scholarly needs. And so, I will need to read _Anna Karenina_ in GoodReader, in this photographed PDF file of an original 1889 Russian edition: https://www.sugarsync.com/pf/D6495512_1736831_99821?directDownload=true The first Russian original book edition appeared in 1877/8, so I'm getting pretty close there; not many corruptions will have been introduced into the text within the 12 intervening years, especially because Tolstoy was around until 1910, and wouldn't have knowingly permitted a "modernized" corruption of his text. Finally, a separate issue is that of translations. Is Marvin meant to be software for an international audience, or just for English-speaking (English-reading) users? If the former, why should Marvin's default "Library" only include books in English, and not also in some other languges? Why Dostoyevsky in English tranlation, but not (also, or primarily) in original Russian? Why Kafka in English translation, but not (also, or primarily) in original German? Why Plato in English translation, but not (also, or primarily) in original Old Greek? Etc. :-) Naturally, it's easy to criticise, and a lot more difficult to suggest some practical alternatives for Kris instead. If, indeed, those alternatives exist. Does a reliable German EPUB edition of Kafka exist, a reliable Russian EPUB edition of Dostoyevsky, let alone a reliable Old Greek EPUB edition of Plato? It must be feared that the answer to all three of these questions is... no. :-( It should be the task of literary scholars and literary institutes around the world, in the upcoming decades, to ensure that *reliable* EPUB editions of all classic works, fully meeting all scholarly criteria, become freely available, so that Kris would then only have the very easy and pleasurable task of "randomly" picking whatever classic he prefers, in whatever *original and/or translated* language, and include it in his sample Marvin "Library". -- Alex. Avenarius www.aboq.org [sent via The Bat! 5.4]
participants (1)
-
Alexander Avenarius