RE: [gutvol-d] Perfection

"Her Serene Highness" <mbuch@mcsp.com> writes:
David Starner writes:
What is he supposed to do, give a page reference to one of a dozen editions that might be very hard for the teacher to find? With etexts, you know that your recipent has access to the same edition you have. And as someone else pointed out, if you quote the sentence, the context can be found in seconds.
**Why not? It's done all the time. Students and scholars have cited rare books that are impossible to find before- I remember citing a rare book that contained the concordat between the Vatican and Germany for a grad class years ago, and information on the Black Star line of Marcus Garvey while still in high school. Why did my professors accept my citations? Because they could be tracked down.
One of the methods of mathematical proof is proof by uncheckable citation. "This lemma is proved in the January 1822 volume of the Bohemian Mathematical Journal, pages 12-43." If the volume is in some library half-way across the country, nobody is going to take the time to check a cite in some students paper. If the teacher is never going to check the cite, what's the point? And if he's going to find the one copy in the nation and order it via ILL, what's so hard about searching through an online document?
But other than as a work of literature, i'd have problems using it- like if I were comparing 19th century versions of Arabic texts, because I'm not even sure it was written in the 19th century.**
Anyone born in 1980 or later would know quite quickly, just like I do. It was translated in 1861 and reprinted in 1971 as part of the Everyman's Library, and has been frequently reprinted. It has a second edition, in 1871; assuming the Everyman's library's text was taken from the second edition, you can quickly check to see whether the PG edition is the first edition or the second edition. Google is your friend. So is the LoC catalogs, but watch out because they frequently have authors split under two headings, one of the marked as being from the old catalog.
**How? Easy. You look at other books about Koranic translations and see if they refer to this one- and guess what? You can't do that online. Which means you have to go to a library anyway. Online isn't BETTER. It's different.
Or I could do a search online and find out that Rodwell's translation is considered inferior by some because he wasn't a Muslim, but is probably one of the better public-domain ones. I also find "All the prominent translations of the Quran have each been the product of a single individual, so there is no translation which truly reflects the collective and opposing thoughts of a range of scholars. Such a large-scale collaborative effort would most likely be required to establish any one translation as most authoritative. Since this has not yet happened, there is no translation of the Qur'an as widely accepted (for example) as the New Revised Standard Version of the Bible. "As a result, individual English-speaking Muslims tend to have their own personal favourites. Indeed, those who read more than one translation often develop a fondness for different aspects of each. For example, the renowned scholar Annemarie Schimmel, author of dozens of books on Islam and formerly professor of Islam at Harvard University, favoured the translation of Arthur John Arberry for beauty of expression, and that of Marmaduke Pickthall for literal rendering of Arabic phrases." which are from <http://en.wikipedia.org/wiki/Translation_of_the_Qur%27an>, and which convienantly have links to the authors so I can find their credentials.
By the way- in a library, I can tell if a book is a reprint.
But what you can't tell is if it was reprinted, if all you have is the original. A quick search through the LoC's online catalogs should give you a pretty reasonable guess as to whether it was reprinted or not.
I could print it out and share it with my friends- after all, most people don't read whole books online. Control F is only useful if I'm in front of a machine. If I want to read a Tom Swift book to my kids at a chapter a night, I'm not going to do it from a laptop or park little Johnny's bed next to my desk.
No, but you aren't doing scholarly work with Tom Swift. And again, your generation doesn't read whole books online, but mine does.
when I can print it out and have it paginated,
My printer _always_ paginates documents. If you're dealing with an old dot-matrix, you have to paginate it manually, but the paper is usually prescored for seperation. (-: But seriously, I can't imagine why you'd want that. The original pages were designed for the original machine, and the change in fonts and typesetting, which is unavoidable, will change where the page breaks would naturally fall even if you strive to keep everything the same as the original. It would be much better to put page numbers in the margins and let the physical breaks fall where they may. Accept that page numbers have become free of the physical form of the book. I think that we should retain source information, but you don't understand and accept the power of the tools at your fingertips. Much of the context about a book can be resolved in a google search or a search of the appropriate library catalog online. Stop and think before you hit that print button; ink is expensive, you know. A lot of things don't need printing out. Try emailing things to people, and letting them print it out if they want to. Ebooks don't need to dance to be useful. Even well-stocked libraries don't have many of the books we do, and even if you have to send for the hardcopy, having the book at hand is useful. It vastly simplifies searching and especially concordence building. Online books are better in many ways, not just different. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm

There is a reason to preserve page numbers in ebooks. While correct academic quotations can be excellently made without page numbers, quoting an electronic version, the same is not true with retrieving information quoted somewhere else (in an old paper edition). So for example if an existing book contains a sentence like "This topic is discussed in book aaaaa in pages xxx-yyy, (the exact edition is quoted in a reference) how do you find easily the exact range of pages, without page numbers? The same happens when a book has an index: the index item is often not found literally in the text, and a page number is an handy way to find the reference. Of course, the index can be improved (in an *ML edition) with cross-links, but transforming an index into a cross-linked version is a lot of work, and has to be done by an expert, while reading a page to find a reference is much less work, and is usually done by a (relatively) expert. Some just answer: then do an HTML or a TEI edition. This I don want: I cannot, and I do not want to learn, I prefer working with text, and do more texts. And I prefer using text instead of *ML. Moreover if I keep page numbers, conversion to *ML with page numbers will be much easier than having to retrieve the numbers from the images. Some say: page numbers are ugly in txt. It is the same people that want to have an *ML version, so why do they bother? Please take the txt version with number, do your *ML and leave the txt alone. Of course, having page numbers in Tom Swift might be too much. But at least if a book has an index, I believe that page numbers might be useful, even in txt, and we should recommend to keep the information. Carlo

Question: How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read? Michael

Michael Hart wrote:
How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read?
Providing source information and page numbers is easy. So it is to provide the page scans. Of course: page scans != ebook. Marking up a book to satisfy most scholarly requirements is more work than I would care for, short of being paid to do it. -- Marcello Perathoner webmaster@gutenberg.org

Marcello wrote:
Michael Hart wrote:
How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read?
Providing source information and page numbers is easy. So it is to provide the page scans. Of course: page scans != ebook.
Marking up a book to satisfy most scholarly requirements is more work than I would care for, short of being paid to do it.
1) There are *reasonable* basic requirements, which are not onerous at all, that can be made to make the PG corpus of texts much more useful to academia and scholars. Here are a few that come to mind: a) Provide full catalog info for the source of the digital text. b) Provide the complete set of page scans. (I'm still of the opinion this should be a requirement, with the allowance that scans need not be provided under several defined circumstances.) c) In markup in the Master copy, add markers (plus maybe XLinks) to page breaks found in the source. 2) Any 21st century digital repository of texts should allow the ability of users to annotate, reference, and interlink the texts. This can be done without altering the texts themselves. Thus, the digital repository will do things that no traditional academic library of atomic-based artifacts can do. Thus, scholars themselves will improve the texts to meet their needs -- we need not do everything for them if we give them the tools to do it themselves. Jon Noring

This is what I want, too. I want cyber texts to be MORE useful, not less. When libraries went to electronic catelogues, Info geeks cheered- they made libraries efficient. They should have been shot. What they did was throw out the original cards, which had been marked up by librarians and scholars, and which provided clues as to which books were worth reading. The people who cheered did not love books- they loved information. Knowledge and information are very different- knowledge takes time. When people thumb through things, they discover new things- hypertext links can help them do this. Several of you here are academics. Academics who give and process info are not the same a researchers- you don't have the same needs. Research takes time and requires facts on a level that number and word-crunching don't. And Michael- I think you are brilliant in many ways, but you don't even want to provide the amount of information required of a junior high school student writing a social studies paper, let alone a scholar- and I think that's a shame. I shudder to think what you believe scholars do, and why, if you love books so much, you have so high an antipathy for them. Getting books on the web is more than a numbers game. It's about preserving somethng of value. What I'm seing here among some people is a mentality akin to the early archaeologists, who completely destroyed sites in their rush to get trophies for their museums. They were bad scientists and little more than barbarians. Destroying books in order to reach the new numerical goal is not a good thing- it's very, very bad. Michele (yeah, I have a name) -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Jon Noring Sent: Saturday, November 13, 2004 12:47 PM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Perfection Marcello wrote:
Michael Hart wrote:
How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read?
Providing source information and page numbers is easy. So it is to provide the page scans. Of course: page scans != ebook.
Marking up a book to satisfy most scholarly requirements is more work than I would care for, short of being paid to do it.
1) There are *reasonable* basic requirements, which are not onerous at all, that can be made to make the PG corpus of texts much more useful to academia and scholars. Here are a few that come to mind: a) Provide full catalog info for the source of the digital text. b) Provide the complete set of page scans. (I'm still of the opinion this should be a requirement, with the allowance that scans need not be provided under several defined circumstances.) c) In markup in the Master copy, add markers (plus maybe XLinks) to page breaks found in the source. 2) Any 21st century digital repository of texts should allow the ability of users to annotate, reference, and interlink the texts. This can be done without altering the texts themselves. Thus, the digital repository will do things that no traditional academic library of atomic-based artifacts can do. Thus, scholars themselves will improve the texts to meet their needs -- we need not do everything for them if we give them the tools to do it themselves. Jon Noring _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:
This is what I want, too. I want cyber texts to be MORE useful, not less.
When libraries went to electronic catelogues, Info geeks cheered- they made libraries efficient. They should have been shot. What they did was throw out the original cards, which had been marked up by librarians and scholars, and which provided clues as to which books were worth reading. The people who cheered did not love books- they loved information. Knowledge and information are very different- knowledge takes time. When people thumb through things, they discover new things- hypertext links can help them do this.
I must admit that I, too, was surprised that ye olde carde catalogues were tossed out like babies with the bathwater.
Several of you here are academics. Academics who give and process info are not the same a researchers- you don't have the same needs. Research takes time and requires facts on a level that number and word-crunching don't.
More on research below.
And Michael- I think you are brilliant in many ways, but you don't even want to provide the amount of information required of a junior high school student writing a social studies paper, let alone a scholar- and I think that's a shame. I shudder to think what you believe scholars do, and why, if you love books so much, you have so high an antipathy for them.
It's not that I don't believe in this kind of information, it's that I didn't want to provide a different Project Gutenberg eBook for each and every single paper edition out there, and then have to keep canonical errors [sic] in them for all time. I wanted to created a "critical edition" that combined corrections and items from various editions, and we have always supplied the necessary information for citing our eBooks on request, which has apparently never caused any problem either for student or teacher.
Getting books on the web is more than a numbers game. It's about preserving somethng of value. What I'm seing here among some people is a mentality akin to the early archaeologists, who completely destroyed sites in their rush to get trophies for their museums. They were bad scientists and little more than barbarians. Destroying books in order to reach the new numerical goal is not a good thing- it's very, very bad.
Being a pioneer is different that being a researcher, unless you are Indians Jones, that is, but even he, if you will recall, had his most important work[s] taken away from him repeatedly by both those above and below him on the Darwinian ladder. Me, I'm a pioneer, not a researcher, and I fully warned everyone year after year that I am NOT a cataloguer, and that once we passed 10,000 books this would become a very obvious problem. However, libraries carry all storts of materials that don't come with cataloging information, such as records, CDs, DVDs, pamphlets, paintings, etc. Doubly, however, I am doing some feasibility studies on providing MARC records, and could use some help. Michael Hart

Michael Hart wrote:
On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:
"Her Serene Highness" is Michele, but given her email address, I doubt her last name is Dyck. Mine is, though. Michael Hart:
... I didn't want to provide a different Project Gutenberg eBook for each and every single paper edition out there, and then have to keep canonical errors [sic] in them for all time.
You say "didn't". Do you still feel this way?
I wanted to created a "critical edition" that combined corrections and items from various editions,
I'm curious: How many such amalgams has PG produced? What was the latest?
and we have always supplied the necessary information for citing our eBooks on request,
But that's not apparent to someone reading a PG eBook, I think. E.g., the PG boilerplate doesn't have a sentence like: To find out what printed edition(s) this eBook was created from, send a request to someone@pglaf.org. -Michael

On Sat, 13 Nov 2004, Michael Dyck wrote:
I'm curious: How many such amalgams has PG produced? What was the latest?
There is no effort to count them, so I doubt you could get a reliable number. One I remember clearly doing was "Roughing it in the Bush" by Susanna Moodie. I used as my basis a text online at another site, which curiously enough, had a very scholarly citation of exactly what it used as its source, although it still had a large number of evident transcription errors. (Also, it said that it was based on the 1852 first edition. Although, learning about the publishing history of this text, I found that there were varying forms of the first edition, as corrections were being made to the plates _during_ the printing process.) Also, on the topic of alamgams, it may help to realize that sometimes corrections are made to a PG text using a different edition than was originally used. Andrew

On Sat, 13 Nov 2004, Michael Dyck wrote:
Michael Hart wrote:
On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:
"Her Serene Highness" is Michele, but given her email address, I doubt her last name is Dyck. Mine is, though.
OK, then I'm still a little in the dark, as we have one other "Her Serene Highess" who has contributed as well. . . .
Michael Hart:
... I didn't want to provide a different Project Gutenberg eBook for each and every single paper edition out there, and then have to keep canonical errors [sic] in them for all time.
You say "didn't". Do you still feel this way?
Eventually, when OCR is about as good as xeroxing, then it shouldn't be much effort to scan multiple editions. See previous note w/ xerox in header.
I wanted to created a "critical edition" that combined corrections and items from various editions,
I'm curious: How many such amalgams has PG produced? What was the latest?
Couldn't tell you, but every time a new proofer sends in errors, it's more likely some were researched from a different edition.
and we have always supplied the necessary information for citing our eBooks on request,
But that's not apparent to someone reading a PG eBook, I think. E.g., the PG boilerplate doesn't have a sentence like: To find out what printed edition(s) this eBook was created from, send a request to someone@pglaf.org.
Usually they just send an email asking how to cite, and I send: Bibliographic information comes from any full record displayed by either: the Project Gutenberg Search Engine (http://promo.net/cgi-promo/pg/t9.cgi) Project Gutenberg Catalog Browser (http://promo.net/cgi-promo/pg/cat.cgi). For an example, if you use Canterbury Tales from our collection, you'll get the following card information: AUTHOR: Chaucer, Geoffrey, circa 1340-1400 AKA: ADD. AUTHOR: Purves, D. Laing, Editor -- TITLE: Canterbury Tales, and Other Poems SUBJECT: LOC CLASS: PR -- NOTES: LANGUAGE: English - DOWNLOAD: cbtls10.txt - 1.62 MB cbtls10.zip - 641 KB Chaucer, Geoffrey, circa 1340-1400. - 2000. - Canterbury Tales, and Other Poems - Urbana, Illinois (USA): Project Gutenberg. Etext #2383. - First Release: Nov 2000 - ID:2862 Where the last three lines should be your bibiliographic information. Hope this helps, So nice to hear from you!! Michael S. Hart <hart@pobox.com> Project Gutenberg "*Ask Dr. Internet*" Executive Coordinator "*Internet User ~#100*"

Michael Hart wrote:
Michele wrote:
And Michael- I think you are brilliant in many ways, but you don't even want to provide the amount of information required of a junior high school student writing a social studies paper, let alone a scholar- and I think that's a shame. I shudder to think what you believe scholars do, and why, if you love books so much, you have so high an antipathy for them.
It's not that I don't believe in this kind of information, it's that I didn't want to provide a different Project Gutenberg eBook for each and every single paper edition out there, and then have to keep canonical errors [sic] in them for all time.
I wanted to created a "critical edition" that combined corrections and items from various editions, and we have always supplied the necessary information for citing our eBooks on request, which has apparently never caused any problem either for student or teacher.
Now I think this is getting us to the core of the various issues being discussed of late. In the early days of PG, when disk space was ultra-expensive (and removable storage was of limited capacity), when volunteers were few, and when the Internet did not yet exist (and when it came into being for the ordinary Joe in the late 1980's with very slow modem access), the idea of PG focusing on producing a "critical edition" of important public domain works for casual reading made a whole lot of sense. However, I believe things have changed so much that this focus needs to be reevaluated. Let's look at the situation today, and tomorrow: (o) Disk space is getting so cheap and of such high capacity that we can now consider it economical for text repositories to hold the high-density original page scan images for *one million books*. When the texts are in high-quality XML, we can hold *billions* of textual works, with no problem. In a decade, we can begin talking about *trillions* of textual works (big and small). There's no longer an issue of which published edition to pick to "represent" a particular Work -- we can have them all online. (o) More and more people have high-speed access to the Internet, allowing fast downloading of books, as well as enabling the technologies to mobilize large numbers of avid volunteers to produce high-quality texts (eventually in XML markup) using Internet-enabled systems such as Distributed Proofreaders. And tomorrow? Here's what I see: (o) We will see Distributed Proofreaders greatly improve in both quality of production (high quality XML output) as well as much greater capacity. It will also be "clonable" by other groups dealing with specific types of publications. I believe we'll see over 1000 major books PER DAY being completed by DP and its various "clones" throughout the world, not to mention innumerable texts of other types. That's a thousand book-length works PER DAY worldwide. Thus, the need for "critical" editions based on technical limitations is no longer an issue. Many works were only issued once anyway, so the etext version *is* the critical edition, but some works were issued in various editions over time -- all of them can now be scanned and placed side-by-side online. Let the end-user decide which one to access, based on their own investigation or by the recommendations of others (advanced systems can be set up to aid in selection -- PG itself can recommend which version the reader should consider first.) It is thus important to preserve the full source information, since end-users will need to know that information, to know what they are getting. If an earlier, more faithful version of the Work is not in the PG system (how would they know unless the versions of the Work already in the system have complete source information?), they can suggest which edition to convert through DP. Ultimately, I hope that PG will cover almost all first and early editions of important works. Another aspect of this issue are submissions of works to PG which are based on original Public Domain works, but which have been substantially modified by the submitter acting as editor, in essence creating a new edition of the Work. For example, my publishing company's version of Sir Richard F. Burton's "Kama Sutra of Vatsyayana", first published in the 1880's, has been significantly edited and modified -- but not expunged in any way -- no content has been removed, but has been moved around to aid with logical organization, plus I've added several annotations to clarify things which Burton inexplicably did not. The publisher intro to this book makes clear what changes were made to the text. For submissions such as this, PG should certainly accept such altered and composite works, but it is important the metadata state clearly this is an "altered" work from the source, or something to that effect, as well as stating what public domain source(s) were used to create the work. (Ideally, PG would have these source works in the PG Library, with the original page scans and the faithful etext versions alongside, so the user of the altered/composite etext will be able to determine, if they want, the alterations which were made to create it.) In summary, I believe PG is making a big mistake going down the road of being a "gatekeeper" or "original publisher" of some sort. It should concentrate on what it does best: locate/acquire, copyright clear, and place online Public Domain (and Creative Commons) texts in high-quality form. Let others do the vetting and recommendations for what should be read. Let PG make it ALL available for free to everyone, everywhere and at all times. Jon Noring

re: Jon Noring reply to "Re: !@!@!RE: [gutvol-d] Perfection: Jon brings up several points that are between the past and the future, and obviously he has some differing points of view as to when each of these events might be placed on the calendar. The obvious point right now is whether Project Gutenberg should be doing several possible editions of each eBook, or should be comparing several different editions and creating our own edition that we hope will eventually be better than any of the previous paper editions. Jon says we should be doing separate editions, due to advances in disk space, download speed, and the time when Distributed Proofing will be doing 1,000 eBooks per day. * If we presume this is going at a rate of about 10 per day [we are at just about 11 per day in reality] and that this rate should be doubling at Moore's Law rates, then we would have this scenario: Bks/ Day Years Date 10 0 2004 20 40 3 2007 80 160 6 2010 320 640 9 2013 1K+ 10 2014+ I agree that when all of these have been integrated into the world of 75% to 90% of even our own portion of the Internet for several years [enough time to do our first eBooks of most of these books] then it will certainly be time to start including variant editions, as we have already done with some of the great works such as those of Shakespeare, Dante, the Bible, etc. In fact, my own estimate of the time we will have 1,000,000 eBooks certainly lies within the realm of Jon's suggested 1,000 per day. By that time, we will probably be finding it harder and harder to track down all the editions we have yet to do, and it will be a matter of very good timing to start in on creating all variants of the editions Jon wants us to have. Hopefully by this time, OCR will be so accurate that the dream of simply using it as one would use a xerox machine, will be closer to reality. In the interim, perhaps we can simply make available various eBook editions that do and don't include any corrections of typos, missing words, lines, paragraphs, etc. This, along with perservation of the original scans, should allow for a timely revision of any and all eBooks we produce. With the aid of various "diff" and "compare" programs, editors can even proofread the same eBook into the various composite or non-composite editions Jon suggests we should have. Anyone who wishes to volunteer to assist Jon in his efforts should let us know, and we will work up a listserver and other support for this effort. Michael S. Hart P.S. The day should eventually come when such efforts are no longer required at the human level, and Jon can simply scan and OCR each separate edition with a sufficient level of accuracy that it could either stand immediately on its own, or do so with only a small amount of human intervention. . .less effort than it may take to work from a previous scan of a different paper variant.

On Sat, 13 Nov 2004, Marcello Perathoner wrote:
Michael Hart wrote:
How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read?
Providing source information and page numbers is easy. So it is to provide the page scans. Of course: page scans != ebook.
Marking up a book to satisfy most scholarly requirements is more work than I would care for, short of being paid to do it.
A big bug/feature for me is page number, bold, italic, underscore, etc., I would prefer an eBook without them. . .they are just too distracting, I just want to read the CONTENT not the FORM. I have heard people mention that creating both kinds of eBooks should be easy from one session, but I'm not sure if anyone is DOING it. BTW, bold, italic, etc., also mess up a lot of search/quotes. Michael
participants (8)
-
Andrew Sly
-
Carlo Traverso
-
D. Starner
-
Her Serene Highness
-
Jon Noring
-
Marcello Perathoner
-
Michael Dyck
-
Michael Hart