
Hugh wrote:
On this discussion, I would not that it is not just a question of transcribed text and/or images, but also frequently of editions. For a writer like James Fenimore Cooper it is not just a question of accuracy of transcription (and imposition of publishers' editorial styles) in the numerous editions made of his works, but also that he frequently made significant changes in his novels in new editions published during his lifetime. This is presumably true of many other writers. So even the "first edition" is not always the "best edition." On page numbering, it is my custom to include page numbers from the edition I am transcribing by placing them in {curly brackets} -- a typographical form I don't otherwise use. This makes it easy for the user not only to determine what page he is "on" in the electronic version, but also to look for specific page numbers, or provide citations to them, easily.
In the demonstration "My Antonia" project, one form of presentation includes showing the page markers which provide links to images of the page scans: http://www.openreader.org/myantonia/basic-design/myantonia.html Another form exposes the paragraph id's so there can be direct automated linking to any particular paragraph (must use Firefox or Opera, won't work in IE): http://www.openreader.org/myantonia/basic-design-nopagenum-paranum/myantonia... (to see paragraph linking in action, see example later on regarding error correction.) It is also possible to link to a certain original source page, for example: http://www.openreader.org/myantonia/basic-design-nopagenum/myantonia.html#pa... (this brings up the approximate spot in the text where the original paper Page 11 started.) Or to do the same for the document where page numbers and links are exposed: http://www.openreader.org/myantonia/basic-design/myantonia.html#page011
To further complicate the edition issue, the Modern Language Association has developed a fairly rigid style form for editions bearing its seal of approval, which results in a synthetic (and hence copyrighted) version which (to make a long matter short) combines the latest text on which the author is known to have worked, with the earliest form (ideally the manuscript) for his spelling, punctuation, and other stylistic matters that usually get changed by publishers to suit their own style manuals. In the case of JF Cooper these editions have been issued in the so-called "Cooper Edition" -- first by the SUNY Albany Press and more recently by AMS, but have been licensed to other publishers such as Library of America, Oxford, and Penguin.
Interesting. With respect to Willa Cather's "My Antonia", the first edition is now Public Domain, but the second and subsequent editions (which had some corrections) are still under copyright. So what we did was to put together a faithful textual reproduction of the first edition (still undergoing some proofing -- should have submitted it to DP in the first place but that's another discussion thread I'd rather not discuss.) Then, using known scholarly information on that Work, marked up corrections made in subsequent Cather-approved editions. To see this in action in "My Antonia", first go to: http://www.openreader.org/myantonia/basic-design-nopagenum/myantonia.html#p0... Notice in that paragraph the word "Austrians" is highlighted in gray. If you put your pointer over the word, in most browsers a little popup window will appear saying "UNL Cather Edition: Prussians". UNL is the University of Nebraska at Lincoln (who we have been in contact with regarding "My Antonia"), and they have reprinted this Work in a scholarly edition which notes what corrections were made to the first edition -- in this example the grayed word should be "Prussians" instead of "Austrians".
Problems of this sort are going to plague the Gutenberg editions of almost any author of the 19th century or earlier, but I have not seen them raised in this discussion.
Definitely. I think what PG and similar projects should strive to do: 1) accurate transcriptions of specific editions with editing only done in limited and well-defined situations (and keep track of such changes right within the document -- esssentially the source book will be textually preserved as it was printed, errors and all. With XML it is relatively easy to produce a "corrected" edition which would be noted as such.) 2) for more popular Works (the "classics") which have varying multiple editions, query with scholars and enthusiasts as to which Public Domain edition(s) should be transcribed. Note the plural of editions since there may be more than one edition worthy of transcription. With DP now in existence, this is no longer an issue. For example, Mary Shelley's "Frankenstein" exists in essentially two different editions. The second edition was significantly changed from the first edition by Shelley, including some differences in the ending. It is one Work, but definitely two unique Expressions as defined in the WEMI system. 3) Certainly accept "modern" edited editions of a Work so long as the licensing is acceptable (Creative Commons) *and* it is identified as a modern edited edition (so the consumer knows what they are getting), *and* at least one edition, faithful to some acceptable Public Domain printing, is already in the archive. Having the modern editions follow MLA guidelines (if allowed) sounds like a good idea. These are my thoughts which I think touch upon your thoughts. Jon