Re: Did this get slipped in without discussion? (was Re: [gutvol-d] Cleaning up messes)

(I redirected to gutvol-d@lists.pglaf.org. Who sent this to Lyris @ listserv.unc.edu? That server is broken, the list there is defunct. I have been trying to delete the list there for months, but the software is perpetually non-responsive) This will probably be my last response to Jon. Clearly, some people can't take "yes, go for it!" as an answer. Jon wants to tell other people how they should do things, but is unwilling to make things happen himself. He insists there is a "right" way of doing things, and belittles the efforts of those who don't fit his notions. My view is that Jon will not be content until all the people working on PG are ousted, in favor of his preferred organization, governance, fundraising, production rules, and collection guidelines. This is not going to happen anytime soon, and other than being critical of the status quo, Jon has contributed nothing towards making it happen anyway. Instead, Jon has repeatedly been offered the ability -- with support and encouragement -- to create the organization or content he so strongly desires. Some people can't take "yes" for an answer, or are not content with the ability to control their own domain without controlling others. A few more comments: On Thu, Nov 11, 2004 at 01:35:04PM -0700, Jon Noring wrote:
Greg Newby wrote:
I have only a few brief things to say about this. Jon, and other interested persons, are very much welcome to start their own projects, sub-projects or related activities to pursue this agenda, or other agendas. We (the messy ones) will provide encouragement and support.
We have pretty extensive wording on this philosophy and encouragement in the "FAQ" items Michael and I wrote, online at
I urge all the PG people reading this to read Michael Hart's statement of the principles of PG governance given in
http://www.gutenberg.org/about/faq1
(Notice the date of it from June, and edited in October, after much of the discussion about the organization and governance of PG. As far as I know, this was silently put up without any announcement to the group.)
Was this statement of principles run by the actual owners of PG, the thousands and thousands of volunteers who have donated their untold hours of time to further Project Gutenberg? Did they get a chance to discuss and approve of this statement?
There were announcements with requests for feedback in about 6 *months* of weekly & monthly newsletters, with advance copies going back to around May. There was a posting to the front page of gutenberg.org, for months and months. There were at least a few mentions on gutvol-d.
Or is PG a "benevolent" dictatorship, where the volunteers-at-large are not given any real say?
You know better.
So much for democracy and decentralization, where "less is more." (Orwell?)
I see PG primarily as a meritocracy. Always, the pattern is to enable, empower, support and encourage those who want to do things to further the mission - or related activities. The people who do the most are the most active in shaping policy and future direction. Your insinuation that there are central power brokers who are insulated from the many people who are contributing is inconsistent with how things -- *all* things -- get done.
Who owns Project Gutenberg, anyway? Until that is clarified, nothing can be resolved.
You know the answer to this, too. You are simply trying to stir up discontent and create an "us vs. them" atmosphere. For those who, unlike Jon, don't know: visit http://gutenberg.org/fundraising for a quick rundown. An even quicker rundown: - Michael created Project Gutenberg, and owns the trademarked name, "Project Gutenberg" - PGLAF was formed in 2001 as the legal entity that operates Project Gutenberg - PGLAF has four board members, including me. I'm also the CEO. - I am a volunteer for PGLAF, and have worked with PG since 1992. The extent to which Michael, or I, or PGLAF, has sway over the daily activities of PG is limited. Set direction: yes. Control some of the technologies: to some extent. Get people to do stuff: only as they agree & desire. The ability of Jon or anyone else to take leadership and make things happen is just as strong as mine, or anyone's. Flinging mud because so few people subscribe to your view of reality is certainly not going to create progress towards your goals.
Finally and most importantly, I utterly reject Jon's accusation that the lack of source matter or other metadata (or formatting, or anything else) makes the Project Gutenberg content of today or yesterday "corrupt."
Let me clarify (again) below, what I wrote in a separate message. You may still reject it, but PG's past carelessness and looseness leads to legitimate questions about the accuracy and acceptance of the pre-DP- era texts. "Corrupt" may be a strong word (and inaccurate), but not placing "textual integrity" as #1 (including the perception of textual integrity) is simply wrong. Note that perceptions are just as real as reality itself.
I take it PG's official position, then, is that PG will continue with the policy of not requiring the source information to be included in the metadata associated with each PG text? If this policy is to
Yes. There is very little that is required, and as the FAQs mentioned above say quite clearly, we intend to keep it that way.
continue, why? If this policy been changed, then that calls into question those texts where the pedigree is unknown.
Question them all you want. Or don't even read them. But if you want to fix them, get started, rather than talking about careless, inaccuracy, lack of textual integrity, etc. As I mentioned, I'm tired of saying "yes" to you, and then having you argue about it. You have all the freedom you could possibly want to do things your way. What you cannot have is control of the past or present of PG.
I'm pretty certain that the vast, overwhelming majority of PG volunteers who do take a position either way on this issue want the full source information to be included in the metadata.
As I said in the followup clarification:
"A clarification...
"Note that certainly any third-party can attempt to verify the authenticity of a PG text even if the source information is not known and no scans are given. However, not giving the source work (and not making the scans immediately available), the third-party has a much more difficult time in verifying the text.
You are envisioning frustrated scholars and others who care about such things. Those are not our target audience, and never have been. While it's likely that some such scholars have "turned off" to PG, I can tell you that there are close to zero requests for such pedigree information that come in on a monthly basis. In short, you are trying to portray your pet peeve as a universal truth, desired by all. First, again (and as stated in literally *every* PG header, for decades): we do not try to keep our books in accord with any particular print edition. We are not catering to scholars who care about particular dead trees sources. Second, I do not accept your idea that this is a major impediment to use and acceptance by scholars, or anyone else. This is pure speculation on your part (regardless of whether it's backed up by a few personal stories), and counter-indicated by the uses and support requests we hear about. Finally, and perhaps most importantly from your point of view, I'm still saying "yes," not "no." I will be perfectly happy, overjoyed even, to have better tracking of source information, richer markup, and available scans for more of our eBooks. I expect that part of our cataloging discussion outcomes will be better facilities for doing this -- as will the outcomes of the PGTEI markup. But as I keep saying, (a) the lack of pedigree, scans, etc. are not going to stop us from adding submitted eBooks; (b) people who want to retroactively work on existing eBooks are welcome to do so.
"PG, by identifying the source document, *and* providing scans, adds a lot of credibility (and greater usability) of the digitized texts it produces and distributes. This action effectively says: "We are proud of our work, and stand behind it fully. We even provide you, the user, with full information about its pedigree, and the original page scans are available for your use and easy verification."
Once again, you are belittling the efforts of everyone who created these works. Did you ever hear the story about flies, honey & vinegar?
"Of course, it also aids in copyright clearance having the original scans and full source information available. Scholars and researchers, too, will now find the collection to be sufficiently authoritative for their purposes, where now it is NOT. If PG wishes to become Big League, it has to begin playing Big League ball."
Your view of Big League ball for eBooks seems to include the following: - stating that all the work of past & current volunteers is crap. Or was it just "corrupt," or "careless?" Or "loose" and "messy?" - dictating that all new content from all sources must include pedigree information and scans, and may only remain true to the printed dead trees edition - only accept complete markup allowing for re-creation of the original printed word In my final words, I again encourage you to start your own effort to make such things happen. Use the PG mailing lists & newsletter to solicit like-minded participants. Work with DP to spin off your own projects there, or your own independent DP-like effort. Play in the big league. Cater to scholars. Include only the works you think pass muster. Build your own constituency. Meanwhile, you might want to review the documents in http://gutenberg.org/about and see again why your efforts to belittle past efforts or pursue your agenda to restrict current activities are rejected. -- Greg

On Thu, 11 Nov 2004, Greg Newby wrote:
You are envisioning frustrated scholars and others who care about such things. Those are not our target audience, and never have been. While it's likely that some such scholars have "turned off" to PG, I can tell you that there are close to zero requests for such pedigree information that come in on a monthly basis.
Well, yes, because scholars AREN'T using PG, that's why you don't get any requests. At DP, we're processing things that no one but a scholar will ever read. Ever. I'm proofreading one of Canon Sells' books about Islam. No one who is interested in current, up-to-date information is going to read this book. It's antiquated. However, some scholar working on a book re "history of Western perceptions of Islam" might be thrilled to get access to an old out-of-print work. If he/she feels the work is reliable, that is. If you don't want to cater to scholars, you're throwing away much of DP's work. -- Karen Lofstrom {Zora on DP}

One of the problems is that, until recently, the whitewashers removed the informations on the origin of the book, like date of the edition, publisher, etc; and the change in policy has not been sufficiently advertised, so some people (even at DP) remove the information to conform to the perceived PG policy. We should at least change the official policy to recommend including the full information on the sources (as well as information on e.g. page numbers when it is useful, e.g. when there is an index or cross-references by pages, or when the origin is a standard reference). I believe that PG has space for everything: combined editions, abridged editions (provided they are stated to be abridged editions...) scholarly editions. What is what should however be stated, and acessible through the catalogue. Cataloguing work may be distributed. I am sure that at DP a cataloguing step done by specialized volunteers might be added, and probably extended to non-DP submissions. The same team might be willing to update the existing items, starting from past DP contributions but extending to the other PG items. But please let us start to have sound cataloguing procedures for the future. For example, PG should have a separate whitewashing step for the catalogue (that might be done by a separate team, the competences required being different). Carlo

On Fri, 12 Nov 2004 06:37:34 +0100, you wrote:
One of the problems is that, until recently, the whitewashers removed the informations on the origin of the book, like date of the edition, publisher, etc; and the change in policy has not been sufficiently advertised, so some people (even at DP) remove the information to conform to the perceived PG policy.
Until recently? I've been regularly including publisher, place of publication, and date in DP books I've uploaded to PG. Except in a few cases earlier this year, all but the date has been deleted by the WW. This has been mildly bugging me for a while--since I do see other new PG eBooks with publisher information included. And as long as I'm delurking, I'll mention that my DP projects include project comments with quoted biographical info on the author (from Web sources, and usually other Web links). Would it be somehow useful if I include the url to the DP project page in the comments section of the upload form? -- Janet Kegg

Janet Kegg wrote:
Would it be somehow useful if I include the url to the DP project page in the comments section of the upload form?
It would be useful to include the dp project number in some form. We have a discussion ongoing with Joshua on how to achieve this. -- Marcello Perathoner webmaster@gutenberg.org

Karen Lofstrom wrote:
At DP, we're processing things that no one but a scholar will ever read. Ever. I'm proofreading one of Canon Sells' books about Islam. No one who is interested in current, up-to-date information is going to read this book. It's antiquated.
The Koran makes the Top 20 of our downloads and is much older.
However, some scholar working on a book re "history of Western perceptions of Islam" might be thrilled to get access to an old out-of-print work. If he/she feels the work is reliable, that is.
The problem lieth not within PG. It lieth within Academia. Academia has to adapt its methods and processes to the new world where information resources are ephemeral. If you cite a dead tree edition of something you are quite confident that the cited text stays put. It wont change its wording or glide from the cited page into the next etc. If you cite an electronic resource you have no such confidence. How do you make sure that the text at the url you cite will not be edited or removed? You cannot. How do you make sure the medium you cite will still be readable in some years? In a hundred years reading a CDROM may be harder than it was to read the rosetta stone.
If you don't want to cater to scholars, you're throwing away much of DP's work.
Its not our problem. Any amount of catering will not do away with Academias percieved "limitations" of electronic media. The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them. -- Marcello Perathoner webmaster@gutenberg.org

On Fri, 12 Nov 2004, Marcello Perathoner wrote:
Karen Lofstrom wrote:
At DP, we're processing things that no one but a scholar will ever read. Ever. I'm proofreading one of Canon Sells' books about Islam. No one who is interested in current, up-to-date information is going to read this book. It's antiquated.
The Koran makes the Top 20 of our downloads and is much older.
However, some scholar working on a book re "history of Western perceptions of Islam" might be thrilled to get access to an old out-of-print work. If he/she feels the work is reliable, that is.
The problem lieth not within PG. It lieth within Academia.
Academia has to adapt its methods and processes to the new world where information resources are ephemeral.
Actually, Project Gutenberg eBooks have proven much less ephemeral than paper books published in the same period, as all of the Project Gutenberg eBooks have been available continuously from their first day of release, while most paper books from over 5 years ago are no longer in print.
If you cite a dead tree edition of something you are quite confident that the cited text stays put. It wont change its wording or glide from the cited page into the next etc.
But only if you find the exact same paper edition.
If you cite an electronic resource you have no such confidence. How do you make sure that the text at the url you cite will not be edited or removed? You cannot.
Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week.
How do you make sure the medium you cite will still be readable in some years? In a hundred years reading a CDROM may be harder than it was to read the rosetta stone.
There are SO many copies of each Project Gutenberg eBook out there that the question of a particular medium becomes irrelevant. . .when you download a copy of Huck Finn, you never know at your end whether it is stored on a CDROM, DVD, RAID, Terabrick, or even a floppy. Most of you don't realize that less then 20 years ago our eBooks were available from my BBS, and that the entire BBS ran on hi-density floppy drives. The fact that the eBooks are independent of the medium, and of hardware or software requirements in "Unlimited Distribution" is what makes them last longer than anything else on the entire Internet. Where else can you find files that were originally posted 33 years ago? Michael

At 06:23 AM 11/12/2004 -0800, you wrote:
Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week.
Hello. Actually, I've had a hard time finding any of the very early editions of PG files. There are some old files in the etext90 directory, but not edition 10 of the first several ebooks. I would be interested to find the very first edition of when10.txt or whatever it was called as MH posted it. Even the old GUTINDEX.* files have been removed, with the earliest being GUTINDEX.96 when it used to be GUTINDEX.90.

On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote:
At 06:23 AM 11/12/2004 -0800, you wrote:
Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week.
Hello. Actually, I've had a hard time finding any of the very early editions of PG files. There are some old files in the etext90 directory, but not edition 10 of the first several ebooks. I would be interested to find the very first edition of when10.txt or whatever it was called as MH posted it. Even the old GUTINDEX.* files have been removed, with the earliest being GUTINDEX.96 when it used to be GUTINDEX.90.
Michael might have some of the older files. There are a few sources, like old Walnut Creek CDs, that might also be able to help. These days, we essentially never delete anything (not strictly true, but close enough... and we run a no-delete mirror for when mistakes happen). But in the past, Michael would remove older files. This was largely due to space constraints on the hosting servers. As for the GUTINDEX* files, we don't keep older files around, since they are essentially always updated weekly. I can see the reason for interest in looking back through older files, though - maybe we'll start doing this in a new subdirectory. Note that the GUTINDEX files have been through many iterations. Michael used to maintain them, then I did, and now George Davis does. The filenames have changed, and so has the format. For the most part, this has been simply to accommodate the changing nature of the publications, enhanced metadata (like contents listings), and other pragmatics. Unrelated story: I needed to print GUTINDEX.ALL the other day (as part of an affadavit for another legal case I'm helping with, where we once again show there are "significant non-infringing uses" for online content). It's about 550 pages. Whew! I hope that's the only time in this decade anyone needs to print it. -- Greg

At 10:54 AM 11/12/2004 -0800, you wrote:
On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote:
At 06:23 AM 11/12/2004 -0800, you wrote:
Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week.
Hello. Actually, I've had a hard time finding any of the very early editions of PG files. There are some old files in the etext90 directory, but not edition 10 of the first several ebooks. I would be interested to find the very first edition of when10.txt or whatever it was called as MH posted it. Even the old GUTINDEX.* files have been removed, with the earliest being GUTINDEX.96 when it used to be GUTINDEX.90.
Michael might have some of the older files. There are a few sources, like old Walnut Creek CDs, that might also be able to help.
I do not have every old Walnut Creek CD ever published, but I do have one and it does not have any of the older files either. I first started using PG in 1995 and even then the very early files from 1971-89 were not generally available. The oldest file, at least as far as apparently the oldest PG header that I am aware of is plboss10.zip. I'm not sure if edition 10 is still available but I have it.

On Sat, 13 Nov 2004, Tony Baechler wrote:
At 10:54 AM 11/12/2004 -0800, you wrote:
At 06:23 AM 11/12/2004 -0800, you wrote:
Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week.
Hello. Actually, I've had a hard time finding any of the very early editions of PG files. There are some old files in the etext90
On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote: directory,
but not edition 10 of the first several ebooks. I would be interested to find the very first edition of when10.txt or whatever it was called as MH posted it. Even the old GUTINDEX.* files have been removed, with the earliest being GUTINDEX.96 when it used to be GUTINDEX.90.
Michael might have some of the older files. There are a few sources, like old Walnut Creek CDs, that might also be able to help.
I could look through my old collections of CD and floppy eBook collections if this is truly important, but you should be advised that the originals of all the earliest eBooks were ALL IN CAPS, and with limited punctuation, since they were typed in on TeleType 33 machines. It would be fun to see if anyone could change them back to the originals, and if the blogosphere that caught Dan Rather could possibly check all the punctuation marks to prove that such a document COULD have been typed on a TeleType 33. Of course, I still have mine here in the basement, and might be able to fake it better than anyone could disprove. However, the whole idea of finding the original files doesn't mean a lot to me. . .but I think the first file was just named "when". . .without any number or any extension. [However, that could have been changed by the system administrators when they moved it to 9-track tape. . .which was done by file location, as I recall, rather than by file name. i.e., give me the file that starts at 1240 feet on tape number 1642. . . . That was the kind of instruction we received back in 1971 when someone wanted the Declaration of Independence.
I do not have every old Walnut Creek CD ever published, but I do have one and it does not have any of the older files either. I first started using PG in 1995 and even then the very early files from 1971-89 were not generally available. The oldest file, at least as far as apparently the oldest PG header that I am aware of is plboss10.zip. I'm not sure if edition 10 is still available but I have it.
I probably still have copies of the first one. . .I think it was an odd green color. . .but, again, it's only of sentimental value as a collectors' item, as far as I am concerned. I wonder if they will appear 100 years from now on "Antiques Roadshow?" ;-)

On Fri, 12 Nov 2004, Marcello Perathoner wrote:
Karen Lofstrom wrote:
The problem lieth not within PG. It lieth within Academia.
I must agree. Academia is perhaps the worst when it comes to the "not invented here," syndome. . .and it pays the price by lagging behind.
If you don't want to cater to scholars, you're throwing away much of DP's work.
Its not our problem. Any amount of catering will not do away with Academias perceived "limitations" of electronic media.
That is, until they take over the eBooks, and claim them as their own.
If you don't want to cater to scholars, you're throwing away much of DP's work.
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist. The real value of the work lies in making it available to the masses, not to the scholars. If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them.
Hear Hear! Michael

-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart Sent: Friday, November 12, 2004 9:31 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] PG audience On Fri, 12 Nov 2004, Marcello Perathoner wrote:
Karen Lofstrom wrote:
The problem lieth not within PG. It lieth within Academia.
I must agree. Academia is perhaps the worst when it comes to the "not invented here," syndome. . .and it pays the price by lagging behind. Sometimes it's not a matter of laggig behind. Academia has different needs and goals than the casual reader. I'm an academic, and I will use PG with undergrands- but tell them to go to paper books for citations. Why? because provenance is important in citation. My students tend to think everything on the net is 'true'- they don't understand that books on the net may or may not reflect scholarly knowledge or acceptance. And often the divisions are too large for useful citation- the page is not only a piece of paper. It's a unit of citation. Page 193 in the 3rd edition of a particular book by a particular poublisher is page 193 in every copy, and contains a finite number of words. Chapter 23 may have a finite number of words, but how do I find the sentence I want to cite? Plus, the edition used on PG might not be the standard- it might be a variant. Variant problems are crucial when trying to read poetry and literature for scholarly purposes. Chefs aren't 'lagging behind' just because most of them still chop food by hand instead of using Cuisinarts. They can control the texture and shape of what they cook much better by using an old-fashioned blade. On the other hand, electric mixers are much more efficient for making cakes and can do a better job than a person beating eggs and butter by hand- which is why pastry chefs use machines most of the time.
If you don't want to cater to scholars, you're throwing away much of DP's work.
Its not our problem. Any amount of catering will not do away with
Academias
perceived "limitations" of electronic media.
That is, until they take over the eBooks, and claim them as their own. We probably won't, unless we can find ways of making exact facsimile scans of books with page numbers, citations, illustrations, and so on. Are musicians silly because they choose to play instruments instead of having machines do all the work? No. Machines, no matter how good they are, don't have the same warmth that physical instruments have. Even if one day they do, I doubt all the instuments in the world will be thrown away. Why do you care whether academics cite PG? You seem to think they should come to you- did you ever think we have this thing called a 'page' that acts as a standard unit of knowledge, and that when we cite something, we need that page to stay reasonably stable? And it does, even with the vagaries of publishing. PG is great but most of the the books you publish aren't the sorts of things that would be useful to a grad student anyway- or even an undergrad, most of the time. For people who want a book on the go, who are looking for an out of print book for nostalgia's sake, for people who need to change print size for readability, PG is perfect. But it's not very useful for citations, any more than tv science programs are. I do think that dedicated proofers can do a great deal, and should be applauded. They can have exactitude. But that's not the problem. The problem is provenance. If you wanted academics to accept you, you would have to provide that, and maybe have experts on particular books vet them.
If you don't want to cater to scholars, you're throwing away much of DP's work.
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist. I agree, and i'm a scholar. Stop worrying about what we think. PG has shown me books I couldn't enjoy otherwise. Scholars don't read scholarly books all the time, and they have places to go for that. The real value of the work lies in making it available to the masses, not to the scholars. If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them.
Hear Hear! I agree- but I would love to see page scans. I don't think that most casual readers (and by that I even include 'serious' readers who do not use written material for citation) understand why pagination is so important to scholars. That's fine. But pease stop assuming that we're all Luddites just because PG is pretty much useless to us academically. Hey- professional baskeball players sometimes play one-on-one for fun; that doesn't mean they have to take such play seriously for it to have value. Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Her Serene Highness wrote:
Why do you care whether academics cite PG? You seem to think they should come to you- did you ever think we have this thing called a 'page' that acts as a standard unit of knowledge, and that when we cite something, we need that page to stay reasonably stable?
Did it ever occur to you that the "page" as "standard unit of knowledge" is a purely arbitrary thing? The standard unit of knowledge depends on the information technology of the epoch. It first was the "cave wall", then became the "clay plate", then it became the "scroll", then it became the "page" and today it is the "internet resource". I can google any cited phrase on the net in a few keystrokes' time. OTOH to verify a quotation, it may take months until I get my hands on a physical copy of some random obscure book . -- Marcello Perathoner webmaster@gutenberg.org

This is a real polarizing issue, with many academics believing that they are the annointed guardians of literature and recorded knowledge. They feel threatened by groups like PG and DP which have by-passed their institutional traditions. Many academics today feel threatened by etexts in the same way that the clergy felt threatened by the printing press. I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for! There are any number of academic etext repositories which block people from accessing public domain material because of `copyright issues'. Worse is how many university presses are making IP land grabs worthy of the RIAA and MPAA. There are a number of books which are now only available to astonishingly expensive editions. The OED is an example of this. Oxford has pumped a huge amount of money into the dictionary, but the dictionary has also been built with an enormous amount of volunteer help. There are no libraries anywhere near where I live in Bangkok with a copy of the OED which I can use. Since I don't have a credit card, I can't get access to the online edition even if I had the money to pay for it. The academic preisthood feels that their powerbase and institutional pupose for existence is threatened so they are circling the wagons and giving the world good reason to threaten them. On the other hand, academics, _are_ often the only people preserving a lot of man's older and mostly forgotten knowledge and placing it in context so that it can be understood today. Academics feel horrified when they hear people say, I don't care about all that stuff, just give me books. This is the same horror that geeks experience when they hear people say with pride that they can't program their VCR and never will. Being proud of being ignorant is something that I have never understood and never will, but I think that what Joe Sixpack is saying is that geeks and scholars should do their job and shouldn't bother him with the details. He's only concerned with the resulting text or software not the process of how it was created. In a sense he's right. That's our job, and we shouldn't try to force the end-user to understand the larger or technical issues involved in doing our job. The great unwashed masses have no idea how much work is involved in doing our jobs and sometimes believe that we're making things far more difficult and complex than it really is. As Neil Stephenson said, most people want a mediated experience like you get from Disney. They don't want to see or deal with the enormous complexity behind it all. I believe that we should think more like special effects artists who believe the best effects, and the ones that they are most proud of are the ones that no one realises are effects in the first place. Many academic editions are so burdened with analysis, and annotation that they get in the way of the text itself. Electronic editions can hide the glorified and sanctified academic Cliff Notes but make them easily accessible if you need or want them. Personally I like it both ways. Sometimes I want I want to work at a text and really study it and all the scholarly apparatus is a godsend. But other times I just want to read a story, and leave the stuff I don't understand for another time. The great promise of the computer age has been to provide tools which allow the average person with no experience or skills to do the work that required highly skilled workers using specialized professional equiptment. Desk top publishing in the 80's is a great example. As soon as laser printers and colour monitors became cheap enough everyone thought that a secretary who barely could use Wordstar could do the work of a team of professional graphic artists and typesetters. Visual Basic was touted as being a language that could be mastered by the average person and produce applications of the same quality as apps written in C by experienced programmers. Right. Apple is now pushing the dream that anyone with an Apple and a good video camera can be the next Stanley Kuberick with less than US$20K in hardware and software. The barrier of entry and access to the tools for the next Stanley Kuberick is now much lower, but that doesn't mean your Aunt Cindy is going to be making the next Full Metal Jacket in the corner of her family room on her iMac. People like Bowerbird (who I suspect is still here, despite giving his formal swan song) want to reduce the complexity behind the scenes to something as simple as what the end-user sees. The thing is, that at first glance it really doesn't look like it's too difficult. And the plethora of cheap, professional quality tools availible through chain stores makes it seem, at first, not to be too difficult. This has had the negative side-effect of giving Joe Sixpack the illusion that all of this stuff is a lot easier than it is and to give the impression that professionals who have spent decades studying and honing their craft are just full of crap and making things more difficult than they have to be. I suspect that over the next decade, institutions will be re-cast and professionals will re-establish themselves so that their education, experience and skills will be respected. But for those of us in the trenches during the transition it won't be easy and it won't be pretty. b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Brad wrote:
I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for!
Interesting. I happen to have a copy of the 1898 printing of `Bradford's History "Of Plimoth Plantation."' My wife's maternal ancestry goes back to colonial Massachusetts, and I think one of her ancestors is mentioned in the book (Degory Priest). If this book has not yet been scanned by anyone affiliated with PG/DP, I'll gladly offer our copy for scanning so long as whatever is used to scan it will not damage the binding (probably can't use a flat bed scanner), and that the scans *will* be made available online for free, even before the work is converted to XML. The book, including index, has over 550 pages, so it is pretty massive. A fascinating work, btw, and one I hope will be scanned and converted to TEI by PG/DP. Jon Noring

I scanned that book and put it through DP around the 4th of July. It is still waiting for someone to decide to post-process it. JulietS Jon Noring wrote:
Brad wrote:
I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for!
Interesting.
I happen to have a copy of the 1898 printing of `Bradford's History "Of Plimoth Plantation."' My wife's maternal ancestry goes back to colonial Massachusetts, and I think one of her ancestors is mentioned in the book (Degory Priest).
If this book has not yet been scanned by anyone affiliated with PG/DP, I'll gladly offer our copy for scanning so long as whatever is used to scan it will not damage the binding (probably can't use a flat bed scanner), and that the scans *will* be made available online for free, even before the work is converted to XML.
The book, including index, has over 550 pages, so it is pretty massive.
A fascinating work, btw, and one I hope will be scanned and converted to TEI by PG/DP.
Jon Noring
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Juliet wrote:
Jon wrote:
I happen to have a copy of the 1898 printing of `Bradford's History "Of Plimoth Plantation."' My wife's maternal ancestry goes back to colonial Massachusetts, and I think one of her ancestors is mentioned in the book (Degory Priest).
I scanned that book and put it through DP around the 4th of July. It is still waiting for someone to decide to post-process it.
Great to hear! Thanks for replying. Jon

On Sat, 13 Nov 2004, Brad Collins wrote:
This is a real polarizing issue, with many academics believing that they are the annointed guardians of literature and recorded knowledge. They feel threatened by groups like PG and DP which have by-passed their institutional traditions. Many academics today feel threatened by etexts in the same way that the clergy felt threatened by the printing press.
And this is one of the reasons they won't accept eBooks, even when I bring them to them personally, free of charge. [Oh, they take them, but the won't allow them in libraries.] They still want to be "A Big Fish In A Small Pond." They don't realize the the walls of academia have been penetrated by the virtual world. . .for them to try to stop eBooks is like James Watson's efforts to stop Craig Venter from mapping DNA, or even his efforts to stop the model building Crick 50 years ago. mh

Michael Hart wrote:
They don't realize the the walls of academia have been penetrated by the virtual world. . .for them to try to stop eBooks is like James Watson's efforts to stop Craig Venter from mapping DNA, or even his efforts to stop the model building Crick 50 years ago.
Well, well, capitalism *has* to be good for something. So lets praise capitalism for kicking the clerics in the *** and freeing information from the inprisonment in monasteries ... before we start kicking capitalism in the *** for making information a proprietary article. -- Marcello Perathoner webmaster@gutenberg.org

On Sat, 13 Nov 2004, Marcello Perathoner wrote:
Michael Hart wrote:
They don't realize the the walls of academia have been penetrated by the virtual world. . .for them to try to stop eBooks is like James Watson's efforts to stop Craig Venter from mapping DNA, or even his efforts to stop the model building Crick 50 years ago.
Well, well, capitalism *has* to be good for something.
So lets praise capitalism for kicking the clerics in the *** and freeing information from the inprisonment in monasteries ... before we start kicking capitalism in the *** for making information a proprietary article.
I'm not sure ANY of the above was done via capitalism. . . . Certainly not Watson, Crick, Venter. . .or PG eBooks. . . . ;-)

On Sat, 13 Nov 2004, Brad Collins wrote:
I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for!
Try getting one from the Oxford Text Archive, hee hee! Presuming they are still in operation, and still use the same user apgreement. . . . mh

Brad wrote:
This is a real polarizing issue, with many academics believing that they are the annointed guardians of literature and recorded knowledge. They feel threatened by groups like PG and DP which have by-passed their institutional traditions. Many academics today feel threatened by etexts in the same way that the clergy felt threatened by the printing press.
I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for!
[snip of excellent comments]
I totally agree that academia (in a general sense, there are notable individual exceptions) is overly protective (to a neurotic degree) of their collections of Public Domain materials and digital derivatives thereof, and should not be. This does not mean, then, that PG and other like-minded digital text repositories should therefore choose not to build their text libraries to have a *reasonable* level of quality for academics and scholars. Rather, what better way to stick it to them is to compete with them on their own turf! Doing this will also raise the consciousness among many, including our politicians, of the value of free and open documents. It might even lead to politicians in progressive states to pass laws requiring their state-run colleges and universities to scan their holdings of public domain works and place them online for free and unencumbered use. After all, many of the "academics" are being paid by taxpayer money, as are many of the archives/repositories they run, thus they are ultimately beholden to the public which pays them, and which is the moral owner of the Public Domain. I'm glad that Michael, this morning, made a call to digitize the OED. Despite my heavy criticisms regarding how PG is run, and what its basic requirements should be, I'm fully in support of its Prime Directive in that (in my words): "All public domain texts, both scans and cleaned-up etexts, should be made, and must be made, freely available in digital form to the world without restriction or encumberance." It pains me when I see publicly-funded academic digital repositories not allowing free and unrestricted access to any work whose source is from the Public Domain. Even if it cost someone $$$ to scan and markup the work, the results should be open to the Public. After all, it is the Public who owns the Public Domain, thus it has the moral right to demand how any digital derivatives of the Public Domain should be used. Jon Noring (p.s., I wonder if some States have an "open documents" law on their books that could be applied to their universities and colleges, and which could be used to force them to open up their digital scans and digital derivatives of public domain works in their collections? I may bring this up with Brewster when I meet with him next week. Thoughts?)

Michael Hart wrote:
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist.
I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman. Instead, some people simply want to make PG texts more useful to scholars than they currently are, and I think we can do that without making them less useful or less available to non-scholars. -Michael

Michael Dyck wrote:
Michael Hart wrote:
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist.
I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman.
Instead, some people simply want to make PG texts more useful to scholars than they currently are, and I think we can do that without making them less useful or less available to non-scholars.
Agreed. It is possible to come up with a "happy medium" set of baseline requirements which will make the PG texts useful for many purposes. Those who wish to make particular texts even more useful than the baseline for a particular user group simply add more stuff. XML makes it quite easy to extend the features -- just add markup to the content and to the metadata fields. A possibly useful exercise is to categorize the various uses and user groups, and then determine what are the most important features each user group especially desires/needs. Without thinking about it for more than 30 seconds, here's a partial list of different user groups. No doubt this list can be expanded and much better described/subcategorized. But it's a start to further discussion if enough here deem it of interest. 1) Personal interest readers 2) Scholars and researchers 3) Students (K-12 and post-secondary) 4) Professional and vocational Jon Noring

I wrote:
Without thinking about it for more than 30 seconds, here's a partial list of different user groups. No doubt this list can be expanded and much better described/subcategorized. But it's a start to further discussion if enough here deem it of interest.
1) Personal interest readers 2) Scholars and researchers 3) Students (K-12 and post-secondary) 4) Professional and vocational
Geez, I forgot one of the most important user groups of all: 5) Readers with special needs (blind, dyslexic, etc.) Note that there's a strong movement to require that K-12 and public post-secondary educational materials be highly accessible, to be offered in accessible formats. In the U.S., for textual materials this will very likely be mandated as the XML-based NIMAS specification (which in turn is derived from the DAISY Digital Talking Book specification.) If we want PG texts to be legally used in the classroom setting, which I think is an *opportunity*, not a *burden*, then we definitely need to assess how the "master" XML Schema settled upon (probably through DP) will be compatible with NIMAS by XSLT or other conversion method. It should be pretty easy to conform most if not all PG Master texts to the NIMAS requirements, since from what I understand the PG Master text Schema will likely be a subset of TEI. I strongly suggest that before any XML-based vocabulary be decided upon as the "master" PG format, that we consult with the technical folk at DAISY, RFB&D, CAST, etc., to assure we aren't overlooking something or doing something which would make accessibility more difficult. As a heads up -- they love good navigational aids in the markup and in external metadata (imagine being blind -- having multiple verbal menus to access the texts in different ways is important!) We might even be able to solicit the help of the accessibility community to add navigational markup to selected PG texts. Jon Noring

-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck Sent: Friday, November 12, 2004 6:55 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] PG audience Michael Hart wrote:
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist.
I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman. Instead, some people simply want to make PG texts more useful to scholars than they currently are, and I think we can do that without making them less useful or less available to non-scholars. -Michael **I have all kinds of books on my shelf- first edition anthro texts, humor books, cook books. Each one of them has a publisher and info on the publishing date. If PG is a publishing house for out-of-copyright books, fine. But it's supposedly a book repository. If it's a repository of books that were actually published in the real world, why are the original paginations, illustrations and figures, maps, indexes and bibliographies, and publication dates such a problem? If I want to be taken seriously as an engineer but I use my own terminology for basic engineering terms or just refuse to lose them at all, why should I get shirty if engineers with college degrees don't take me seriously? _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Fri, 12 Nov 2004, Her Serene Highness wrote:
-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck Sent: Friday, November 12, 2004 6:55 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] PG audience
Michael Hart wrote:
If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist.
I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman.
I'm worried that making the eBooks acceptable to scholars may take more effort than simply creating them did, and that then the scholars, libraries, etc., may still opt not to use them or to encourage others to use them. I'm working up a feasibility study on this now, let me know if you have a library/librarian/scholar who is willing to try out a few dozen eBooks with these additional features. Michael S. Hart

Michael Hart wrote:
If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
We could make even more difference by doing both! Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels? -Michael

Illiterates rarely use computers for reading. PG would be useful after a person became literate, i.e., able to read. Even the children's books on PG are a bit too advanced for a person who is non-litereate. Having taught reading, it would not be the first place I would turn- it's too text-heavy, for one thing. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck Sent: Friday, November 12, 2004 7:00 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] increasing literacy Michael Hart wrote:
If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
We could make even more difference by doing both! Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels? -Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Fri, 12 Nov 2004, Her Serene Highness wrote:
Illiterates rarely use computers for reading. PG would be useful after a person became literate, i.e., able to read. Even the children's books on PG are a bit too advanced for a person who is non-litereate. Having taught reading, it would not be the first place I would turn- it's too text-heavy, for one thing.
Given that most computers today can read eBooks out loud, it's a perfect way to learn how to read, except that you might end up talking like Stephen Hawking. . . . However, given the number of people who learned English and other languages over the short-wave radio, I think there is a real future for this. Obviously it's not "The Young Lady's Illustrated Primer" as in "The Diamond Age," by Neal Stephenson, but it's a start. I have received a number of emails from people who were starting to learn English who found our eBooks very useful. Michael

On Fri, 12 Nov 2004, Michael Dyck wrote:
Michael Hart wrote:
If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
We could make even more difference by doing both!
Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels?
Lots of schools and home schoolers have sent me messages asking and thanking us for the PG eBooks. . .enough to realize that it is no longer just a dream for these to be used in schooling. As for libraries, I get less of these messages from them, but still find that things are getting started there. mh

-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart Sent: Saturday, November 13, 2004 12:57 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] increasing literacy On Fri, 12 Nov 2004, Michael Dyck wrote:
Michael Hart wrote:
If we can increase literacy by even 10%, we make more difference than if we cater to the scholars.
We could make even more difference by doing both!
Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels?
Lots of schools and home schoolers have sent me messages asking and thanking us for the PG eBooks. . .enough to realize that it is no longer just a dream for these to be used in schooling. ** Still, that's not the same as increasing litereacy. That's facilitating literacy. Did they say they became smarter or better read? The books are good for schooling- but they could be a hell of a lot better. Having taught every level of school except for elementary (and I've tutored in that), I still say it wouldn't take much to add info that would push PG forward into classroom acceptability.** As for libraries, I get less of these messages from them, but still find that things are getting started there. **How do you find that? Are librarians saying that, or are you? Are they using other textual sites? Why? Do they suggest improvements? What are they?** mh _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Marcello wrote:
Karen Lofstrom wrote:
If you cite a dead tree edition of something you are quite confident that the cited text stays put. It wont change its wording or glide from the cited page into the next etc.
If you cite an electronic resource you have no such confidence. How do you make sure that the text at the url you cite will not be edited or removed? You cannot. How do you make sure the medium you cite will still be readable in some years? In a hundred years reading a CDROM may be harder than it was to read the rosetta stone.
Actually, this issue is dealable using hash functions. Once a digital document is finished and archived, simply calculate a hash value for it (or the set of files the work comprises.) Use a published, open standards hashing algorithm -- there's many out there to choose from. It's also possible to use digital signatures in some manner, but I'll let the experts in this area discuss this possibility. Textual integrity is definitely an issue, and it goes beyond just keeping academics happy -- it is germane to the perceived integrity of the entire collection of texts by society-at-large. By keeping the page scans along with the digital texts, we are, in effect, telling the users of the digital texts that we fully stand by the textual integrity of the collection, that we did not pull any fast ones, and that it can be trusted. We are putting our reputation on the line. With using digital hashes and digital signatures, and redundant/ mirrored text repositories, we go a long ways towards assuring the collection maintains its integrity. As others have noted, some dictator or totalitarian regime in the future may break into one of the repositories and start tweaking texts. So long as the whole world does not revert to totalitarianism (where then we have much bigger problems than the integrity of texts), then with a properly designed repository it will always be possible to restore the original digital texts from a clean, untouched digital repository. Hopefully individuals will also keep digital texts laying around, but again here we also need to keep in mind individuals can also tweak the texts, thus the use of hashing/digital signatures is still needed.
If you don't want to cater to scholars, you're throwing away much of DP's work.
Its not our problem. Any amount of catering will not do away with Academias percieved "limitations" of electronic media.
I don't have such a pessimistic view of academia. Yes, academics are strange birds. But as the old generation dies, and a new generation arises, familiar with accessing digital information, they will embrace digital media with a fervor. PG can certainly make its texts "academia friendly", or at least reasonably so. The incremental effort (delta-t) to do the few more things to make PG texts more academia-friendly is pretty small compared to the overall time it takes to scan/type/OCR/proof a text. And many of these added things have other small benefits outside of academia itself, benefits for other user groups of PG texts.
The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them.
It behooves PG to at least reasonably reach out to the requirements of "academia" (which is not as monolithic as implied) in markup and metadata, and include the original page scans for every work. That's all that can be done and should be done. Making the page scans available has purposes beyond just keeping academics happy. For example, someone may wish to issue a retypeset print edition of some work using the XML-based PG texts. Having the original page scans there to verify document structure and layout oddities will be useful to those doing final proofing of the output typography. And as noted above, having the original page scans available to future generations is a further protection of the textual integrity of the digital text. It also has the side-benefit of being a digital preservation of the original source, and this alone is a very powerful argument to keep the page scans as an honored and integral part of the PG collection -- it will greatly add value and purpose to the PG collection. Disk space and bandwidth is no longer an issue (well, it's no longer a major, show-stopper issue as it was a decade ago.) It mystifies me why the original page scans are treated by some here as some sort of waste product, meant to be flushed down the toilet when done, or that we don't need to preserve them, or need to have access to them (I'm still surprised to hear that the scans for some of the DP texts are not available to the public because of licensing issues.) Jon Noring

Greg wrote:
Jon wrote:
Wow, my backside is really sore from the spanking Greg just administered to me. Some of the spanking was deserved, but some of it was not, imho. More on that later, but first I'd like to first give some thoughts on the problems with the gutvol-d list and archive before answering several of Greg's comments. (Walking slowly...)
(I redirected to gutvol-d@lists.pglaf.org. Who sent this to Lyris @ listserv.unc.edu? That server is broken, the list there is defunct. I have been trying to delete the list there for months, but the software is perpetually non-responsive)
I'm not sure. I think I directed all my replies to the right place. Btw, I tried to search the gutvol-d archives with regards to the FAQ0 and FAQ1 issue (that is, how much was it really discussed on the lists as Greg said it was?), and noticed that indeed the archive appears broken -- everything before August is gone. James Linden told me that the older archives may be lost for good, at least the Lyris version. Did anyone here keep their own copy of the gutvol-d (and I suppose other gut*) archives? I've kept full backup archives of the several dozen mailing lists I've run since 1992 (by simply collecting all the emails sent out in plain text unix mbox format), but not lists I don't run, thinking that those who administer them do as I do and create redundant backups in a universal plain text format (as Michael would approve!) Since I've lately been sticking my nose into various affairs here some think I should not, I may as well do it one more time and give another opinion that the various Gutenberg lists be moved to YahooGroups (with 2-3 people designated as backup archivists in unix mbox format -- I'll gladly volunteer to be one of the backup archivists since I already do that for over twenty lists I run and co-administer.) Why YahooGroups and not some listserv software running on PG's own server? 1) I've had experience running various listserv software since 1992, and I find a lot of time is saved when someone else does it for me as YahooGroups does. 2) YahooGroups is actually very good and reliable, and since so many people now subscribe to one or more YG lists, it's easy to subscribe to one more. My decision to move The eBook Community, now with over 2400 subscribers, to YahooGroups in 1999 has proven to be the right decision. With a custom listserv run by PG, it's just another list I have to separately subscribe to, and if I have to change my email address, it's another separate service I have to contend with. YahooGroups consolidates all my subscribed lists into an easy, manageable form that no other listserv software comes close to in power and convenience. 3) YahooGroups includes other useful services, such as a Files Area and facilitated YahooIM Chat. 4) It's free! It doesn't take up any diskspace or bandwidth on the local server. (There are the insufferable ads, though, but these are easily ignored.) 5) It is possible to extract plain text (with full headers) for every posted message to any YahooGroup. 6) It archives messages for quite a while back. The eBook Community presently has 21289 messages available in the online archive dating from 1999 -- I don't know when YahooGroups will begin lopping off the oldest ones to save space, but it hasn't yet. I have separate archives for the mirrored unix mbox archive. 7) It has a web access for those who prefer that over receiving email. 8) Administration by the moderators is a breeze. ********************************************************************** O.k., to address some of the issues Greg brings up. He is certainly angry with several of my comments today. As noted above, some of it I deserved, either in what I said or how I said something...
My view is that Jon will not be content until all the people working on PG are ousted, in favor of his preferred organization, governance, fundraising, production rules, and collection guidelines. This is not going to happen anytime soon, and other than being critical of the status quo, Jon has contributed nothing towards making it happen anyway. Instead, Jon has repeatedly been offered the ability -- with support and encouragement -- to create the organization or content he so strongly desires.
There are several related points I'd like to address here, since Greg brings up a couple I didn't really want to talk about (who otherwise cares about my motivation for being here and for what I've brought up recently?): ***** My motivation is certainly not to "take over" PG and build a dictatorship, and to kick out the old guard. Those who know me know that I'm the opposite and in fact fear the same things Michael does with respect to proprietary interests trying to defang the growth of a robust and fully available digital public domain. The OpenReader Project, which I co-founded, clearly shows my focus on open standards, open source, and creating an ebook future founded on these principles. In personality type, I am definitely a Fighting Idealist, for better and for worse. I am definitely not very politically savvy and not very diplomatic with my words, again for better and oftentimes for worse. For example, I commented earlier today, in response to a message Juliet posted, that maybe DP should consider a policy that if they don't get unencumbered page scans to put freely online (because some group is anal about their beloved source document of a public domain work), then they should not accept that situation and work around it. Who's the idealist here? (referring to PG's FAQ0 or FAQ1.) (But DP has their way of doing things and policies, which is fine. I greatly admire DP for what they have accomplished, are now doing, and fully support their vision for going to the next-level with an XML-based system. Juliet is doing an extraordinary job and has not been thanked enough for what she and her volunteers have accomplished, which borders on the remarkable. I am working with Juliet and Charles (who's currently on "sabbatical") to help them, as I can, with the organizational challenges in their wish to move to next level, both in XML implementation, and in increasing their capacity to meet the challenges for the intriguing "Million Digital Texts Project.") I make no bones I have strong feelings based on the bigger picture as I see it -- and I honestly believe my vision is even bigger than Michael's. I don't believe the ad-hoc, everyone does it their own way approach for producing etexts is sufficient any more to accomplish this Big Vision, and in fact will work against the Big Vision. Greg no doubts disagrees with me as FAQ0/1/3 outlines, but so be it -- history will be the ultimate arbiter of our differing world views. I see how inadequate the current PG collection is for the future. This evaluation is based upon three different ventures I've been involved with since 1999 (including one now in development) where this Big Vision has been, and is now being researched, by some really sharp technical people who are nailing down the many architectural and technical requirements. There are many more subtle requirements than one would at first imagine -- I'm only now beginning to understand them in a holistic sense -- and they reflect themselves all the way back to the fundamental structure of the texts themselves, and the associated metadata/catalog information. I see millions of high-quality, uniform digital texts, both public domain and Creative Commons, in a single repository which allows people to access them, annotate them, and link them together and with other texts and with other types of multimedia content in other repositories in very powerful ways that would take too long to describe here. That's one reason I state the master texts must be in well-structured XML, since that will enable the advanced features this repository will have. Properly done XML also confers many other benefits too numerous to mention here. Both DP and PG have blessed the right XML approach (e.g., as exemplified by Marcellos PGTEI), which is very encouraging. But there's more. For reasons I won't go into here (again for brevity sake), this Big Vision also sets slightly more stringent requirements on both metadata and cataloging than is currently done in PG, and it's the spinning wheels of the current discussion on metadata and cataloging that lead to my posts this afternoon out of sheer frustration. I see no *requirements* mentioned, and no vision as to *what* the metadata/ catalog information is to be used for. How can one fix the metadata requirements without a discussion of what the metadata will be used, and useful for? It is frustrating to see all this ad-hoc activity happening with no guidance as to the who, what, when, where, why and to what extent -- the purpose of the metadata -- being resolved based on general requirements, which in turn are derived from the full and detailed vision (which is NOT given in the FAQs) of why PG exists and what it produces. Certainly I could try to force my way further into the discussion (more than I have now) and try to provide answers to these questions, but then I'll just become another voice to add the ad-hoc cacophony we now have where the one who produces something first wins, even if it ends up not meeting the full long-term goals. This is the result of the FAQ0 and FAQ1 philosophy, which does not always give the results one hopes for. To get resolution on tough issues it is oftentimes necessary for the leadership to take charge and to firmly guide discussion to logically resolve what must be done. In some ways, it may be that the "leadership" simply doesn't have the time (because it is voluntary) to formalize the process to force a structured approach to fast decision-making and buy-in to the result. Understandable, but sad. What I fear the most, and this I've expressed to Brewster Kahle (who I meet again next week about Project Gramophone) and to JD Lasica (who's launching the ourmedia project and I'm assisting with the metadata/ cataloging side) is that many people will develop these wonderful repositories of digital content (I'm also working on Project Gramophone/Sound Preserve to transfer and archive millions of old sound recordings), with billions of digital objects, which simply won't and can't "talk" with each other, because everyone is "doing their own thing" PG-style. Wheeee, the late 60's all over again. <smile/> Let me give a small example to illustrate just a corner of what the world could be like if everything is done properly: Imagine someone creating a video for ourmedia where someone is playing the piano, say "Take the 'A' Train", composed by Billy Strayhorn and which became Duke Ellington's theme song. We would want to be able to allow the viewer to link, if they so choose, with the song lyric repository, with various wikipedia entries, and to Sound Preserve to bring up orchestral recordings of "Take The 'A' Train" by Duke Ellington and others. We'd also like to link to the Project Gutenberg collection for any works, such as Duke Ellington's book "Music is My Mistress" (assuming PG got permission to add it, likely not.) And of course we'd allow the end-user to join special communities built around any particular topic connected with that song -- just as Ellington communities, jazz communities, Strayhorn communities, etc. Doing all of this (and a lot more) confers a few added requirements, especially with regards to metadata information (text has the redeeming grace that it is fairly easy to dig out some information by full text searching -- but not standardized subject matter fields! -- but it is much harder with video and audio so the metadata and cataloging requirements for video and audio will likely be more stringent and extensive.) PG's self-enforced isolation, because of its seeming fear of working with the Big Boys (which is somewhat understandable) is working against PG in various ways in seeing the bigger picture of how the text production activities it is catalyzing will mesh with this much bigger, more wonderful world. But if the various repositories don't do it right from the start, including Project Gutenberg, and they end up with millions and billions of digital objects *not done right*, then the interlinkage will be much more difficult and nowhere near as powerful and useful as it could be. It will be essentially impossible to fix after the fact. JD Lasica now recognizes this and is supporting somewhat expanded metadata standards to assure inter-repository linkage, but I don't see the PG "leadership" seeing this, nor am I confident it can because of the FAQ0/1/3 constraints. Note how PG is having difficulty fixing the metadata and catalog info for a *measly* 10,000 or so texts. Imagine having a million of them *not done right* (especially with regards to metadata and catalog information requiring human input -- for some digital objects, if the data is not collected right at the start, it will be impossible to figure it out much later, even with human intervention. So much for the power of our digital future.) (Part of the Big Vision calls for aiding integration using James Linden's very interesting "Open Genesis" concept, currently under development. James is probably not yet ready to discuss this, but it is best described as the "Semantic Web Done Right From the Start." The requirements Open Genesis confers upon digital content repositories are surprisingly quite minimal -- but it is needed to have a standardized framework to improve inter-repository and inter-object linking. Marcello's effort to bring RDF into the mix is laudable and will certainly aid more robust intra- and inter- repository linking.) I'd love to see PG take the lead to make this happen for the text side of the house, and that's my motivation in pressing a lot of issues here to the point where I may become personna non grata, but it won't happen until PG realizes that it needs to confer more requirements on the texts and metadata it catalyzes and collects from the many volunteers (outside of DP, which is doing things mostly right by my reckoning), as well as to more actively work with other repositories -- to become a part of the bigger world rather than isolating itself as it seems to. It needs one or two full-time people -- this costs some $$$ -- this requires a somewhat higher level of organization and a maybe a slightly different governance to even be given this $$$ (or to develop some ongoing revenue stream.) And if it wants to play a major role in the "Million Digitized Texts Project" (should it get successfully launched), it *has* to change its governance and how it interacts with the world at large. Frankly, the FAQ0 and FAQ1 documents are actually quite hostile by inferring the world at large is somehow evil and out to get PG. Yes, some parts of the world at large are hostile to PG and wish it gone, but not all of them. The wisdom is to associate with your friends and those who share the same vision, not drive them away by painting everyone with the same "evil" brush. If you don't believe FAQ0 and FAQ1 sends this message to those in various outside groups, I suggest the wording of FAQ0/1 be looked at again by what it doesn't say but should say. For example, there's little in there about building, for example, close strategic partnerships with other like-minded organizations, and to work together on common standards and common goals. Nothing there is mentioned about joining standards and other types of organizations so as to promote PG's interests. PG has become disturbingly quite xenophobic in orientation -- it acts as if the rest of the world does not exist or does exist and is evil, and that magic will always automatically happen if you simply let everyone do their own thing. Magic does happen often, but magic can also run out. To answer Greg's "I don't take Yes For an Answer" (which is, interestingly enough, what the New York Times William Safire today used to describe Arafat's 1999 refusal of unbelievable concessions by the Israelis), let me say that I am working hard on the vision. I'm coordinating with ourmedia, with Project Gramophone (now called Sound Preserve), and working with another venture dedicated to tying this all together and to launch the "Million Digitized Texts Project." Will we succeed in at least launching MDTP? Maybe. Maybe not. But I am taking Greg's "Yes for an answer" to heart and I am working on it as I envision it -- it's just that it is not restricted to the closed world of PG so that's why it seems somewhat out of lockstep with what is going on here. But if we do succeed in launching MDTP and the Bigger Vision it will be a part of, and if PG wants to play a *major* role with MDTP -- and I'd certainly welcome PG and its "leader volunteers" to jump onboard for many obvious reasons -- PG will have to change in certain ways simply to work as a major player with the MDTP project. If PG decides it rather not change its governance and focus by increasing the acceptable text and metadata standards (which really are not that much), then that's totally understandable -- PG could still play a role, but it would essentially be peripheral and the parade may end up marching by it. ***** On another point, if I expressed wording reflecting hostility to those who have contributed texts to the PG collection over the years, this was not my intent, and I apologize for this. I've typed in whole books by hand, and then laboriously proofed them, marked them up, and converted into ebooks, so I am familiar firsthand with this process of love. Some of the books being talked about here -- the very difficult 17th/18th century texts -- is a remarkable achievement to digitize (and markup as well.) It amazes me the commitment many people here have to digitizing texts. My comments were directed at the leadership for not following what I believe are slightly more stringent policies with regards to metadata and text formatting requirements (some of which are understandable given where things were in the early 1990's). I'm a firm believer in the principle of "the buck stops here". That is, if there are problems, it is the responsibility of the PG leadership due to their prior decisions and established system. It may be unfair at times since it is impossible to accurately predict the future and to develop the right approach to meet that future (e.g., Michael Hart's early allergy to including source information in texts appeared to be a protection mechanism against copyright infringement claims.) But nevertheless it is up to the leadership to take responsibility, adjust accordingly, and to pro-actively "fix it". Maybe some of the problems are best solved by the ad-hoc, hands-off approach as given in FAQ0/1/3, but I don't believe all problems with the PG Collection will be solved by this approach, especially when looking at the useful linkage of the PG collection with other content repositories as outlined above, which requires an integrated approach, and working cooperatively with other groups. ***** On a point related to what I wrote earlier, I'm troubled by this view that PG's collection should be focused toward a particular use niche, rather than to be designed to be useful for just about every use. As I've analyzed things, the added requirements to make PG digital texts useful not only for general reading, but for scholarship and research (plus linking to other repositories) are so few that to ignore them is downright puzzling. What is needed? Well, require the source info be included in the metadata -- that's the major one. The next one is to work hard to acquire and preserve page scans. There is likely a few other requirements which are even less burdensome. The vast majority of the effort to produce digital texts from paper copy is to scan (or type in) the book and then proofreading it. The rest of the added stuff to make the texts more useful is time- and effort-wise miniscule by comparison. This reminds me of a Minnesota-Norwegian joke about the Norwegian who tried to swim across a lake -- when he got 95% of the way to the other side, he decided he couldn't make it, and swam back. It's ludicrous not to make that extra 5% effort, and elevate the PG collection to a significantly higher plane of usefulness, quality, and better digital integrity (talked about next). This is especially tragic given the hundreds of thousands of hours already devoted to the PG collection, when that extra 5% (if that) would have made a significant improvement. ***** And about digital integrity, I stick to my position that anything which PG requires to increase the digital integrity of the text itself to the original source is a Good Thing (tm). Certainly deviations from the source must be allowed, such as correcting some obvious typesetting errors (as an aside, has PG established a uniform policy for what types of edits/corrections in the digital text are allowed? Or is this again one of those FAQ0/1 "let's not interfere with anyone", type of things?) But what I mean by digital integrity has to do with the faithfulness, or more importantly, the perception of faithfulness, of the *meaning* of the text to the original source. It's a legitimate question to ask whether those involved in producing digital texts took more liberties with the text than they should have? This is not a trivial issue when we look at history where censorship is the norm. Certainly, as Greg pointed out, the source texts themselves may have been grossly edited contrary to the author's original intent (if it were not the first edition, for example), but we must not add to this problem in any way (instead, let's also do the first edition!) In addition, I believe one intent of PG is to assist with the effort to assure the digital texts will survive into the distant future, to hopefully survive wars, revolutions, totalitarianism, digital "book burnings", etc. As the centuries roll by, the issue of digital integrity becomes more and more important for the integrity of the information being passed on to future generations. That is why I believe it is necessary for PG to establish policies for new texts, and to begin working on upgrading some of the existing texts at the appropriate time, to standardize the digital integrity requirements as much as possible, and more importantly to acquire and preserve the original page scans whenever possible. Having the original page scans available side-by-side with the digital texts also benefits everyone (and the Big Vision) by resolving any difficulties in presentation of the digital texts (we all know how weird some texts are), and for fighting against claims of copyright infringement. Contrary to Michael Hart's early policies in hiding the pedigree of digital texts, having the page scans available, so long as our copyright clearance procedure is sufficent, actually strengthens PG against claims of copyright infringement. ***** As a final note, I do agree with several who responded today about my call for redoing the older PG texts, saying we should wait until DP moves to the next-generation XML-based system before redoing these texts. I definitely agree as I think about it. What I think could be done, however, is to prepare for this eventuality by 1) flagging those texts we'd like to redo someday, 2) search for higher-quality source books which will give us *unencumbered* page scans, and then 3) file those page scans away in the archive for later conversion to digital text at the appropriate time. There's nothing wrong with decoupling the scanning stage from the proofreading stage. No doubt my answers will not satisfy everyone, and may not satisfy anyone. But after my spanking, I needed to reply, and in one case apologize. Jon Noring

I have kept all PG mail since I subscribed in sept. 2001. It needs to be sorted in the different lists, might contain some extraneous items, and might miss something. If somebody wants them to reconstruct the archives, Il be glad to contribute them. I don't make them immediately available, since I would have to check first that something private is not contained there, since my filtering is not always accurate. I dislike YahooGroups, I by far prefer a pglaf-based mailman. Carlo
participants (13)
-
Brad Collins
-
Carlo Traverso
-
Greg Newby
-
Her Serene Highness
-
Janet Kegg
-
Jon Noring
-
Joshua Hutchinson
-
Juliet Sutherland
-
Karen Lofstrom
-
Marcello Perathoner
-
Michael Dyck
-
Michael Hart
-
Tony Baechler