
Greg wrote:
Jon wrote:
Wow, my backside is really sore from the spanking Greg just administered to me. Some of the spanking was deserved, but some of it was not, imho. More on that later, but first I'd like to first give some thoughts on the problems with the gutvol-d list and archive before answering several of Greg's comments. (Walking slowly...)
(I redirected to gutvol-d@lists.pglaf.org. Who sent this to Lyris @ listserv.unc.edu? That server is broken, the list there is defunct. I have been trying to delete the list there for months, but the software is perpetually non-responsive)
I'm not sure. I think I directed all my replies to the right place. Btw, I tried to search the gutvol-d archives with regards to the FAQ0 and FAQ1 issue (that is, how much was it really discussed on the lists as Greg said it was?), and noticed that indeed the archive appears broken -- everything before August is gone. James Linden told me that the older archives may be lost for good, at least the Lyris version. Did anyone here keep their own copy of the gutvol-d (and I suppose other gut*) archives? I've kept full backup archives of the several dozen mailing lists I've run since 1992 (by simply collecting all the emails sent out in plain text unix mbox format), but not lists I don't run, thinking that those who administer them do as I do and create redundant backups in a universal plain text format (as Michael would approve!) Since I've lately been sticking my nose into various affairs here some think I should not, I may as well do it one more time and give another opinion that the various Gutenberg lists be moved to YahooGroups (with 2-3 people designated as backup archivists in unix mbox format -- I'll gladly volunteer to be one of the backup archivists since I already do that for over twenty lists I run and co-administer.) Why YahooGroups and not some listserv software running on PG's own server? 1) I've had experience running various listserv software since 1992, and I find a lot of time is saved when someone else does it for me as YahooGroups does. 2) YahooGroups is actually very good and reliable, and since so many people now subscribe to one or more YG lists, it's easy to subscribe to one more. My decision to move The eBook Community, now with over 2400 subscribers, to YahooGroups in 1999 has proven to be the right decision. With a custom listserv run by PG, it's just another list I have to separately subscribe to, and if I have to change my email address, it's another separate service I have to contend with. YahooGroups consolidates all my subscribed lists into an easy, manageable form that no other listserv software comes close to in power and convenience. 3) YahooGroups includes other useful services, such as a Files Area and facilitated YahooIM Chat. 4) It's free! It doesn't take up any diskspace or bandwidth on the local server. (There are the insufferable ads, though, but these are easily ignored.) 5) It is possible to extract plain text (with full headers) for every posted message to any YahooGroup. 6) It archives messages for quite a while back. The eBook Community presently has 21289 messages available in the online archive dating from 1999 -- I don't know when YahooGroups will begin lopping off the oldest ones to save space, but it hasn't yet. I have separate archives for the mirrored unix mbox archive. 7) It has a web access for those who prefer that over receiving email. 8) Administration by the moderators is a breeze. ********************************************************************** O.k., to address some of the issues Greg brings up. He is certainly angry with several of my comments today. As noted above, some of it I deserved, either in what I said or how I said something...
My view is that Jon will not be content until all the people working on PG are ousted, in favor of his preferred organization, governance, fundraising, production rules, and collection guidelines. This is not going to happen anytime soon, and other than being critical of the status quo, Jon has contributed nothing towards making it happen anyway. Instead, Jon has repeatedly been offered the ability -- with support and encouragement -- to create the organization or content he so strongly desires.
There are several related points I'd like to address here, since Greg brings up a couple I didn't really want to talk about (who otherwise cares about my motivation for being here and for what I've brought up recently?): ***** My motivation is certainly not to "take over" PG and build a dictatorship, and to kick out the old guard. Those who know me know that I'm the opposite and in fact fear the same things Michael does with respect to proprietary interests trying to defang the growth of a robust and fully available digital public domain. The OpenReader Project, which I co-founded, clearly shows my focus on open standards, open source, and creating an ebook future founded on these principles. In personality type, I am definitely a Fighting Idealist, for better and for worse. I am definitely not very politically savvy and not very diplomatic with my words, again for better and oftentimes for worse. For example, I commented earlier today, in response to a message Juliet posted, that maybe DP should consider a policy that if they don't get unencumbered page scans to put freely online (because some group is anal about their beloved source document of a public domain work), then they should not accept that situation and work around it. Who's the idealist here? (referring to PG's FAQ0 or FAQ1.) (But DP has their way of doing things and policies, which is fine. I greatly admire DP for what they have accomplished, are now doing, and fully support their vision for going to the next-level with an XML-based system. Juliet is doing an extraordinary job and has not been thanked enough for what she and her volunteers have accomplished, which borders on the remarkable. I am working with Juliet and Charles (who's currently on "sabbatical") to help them, as I can, with the organizational challenges in their wish to move to next level, both in XML implementation, and in increasing their capacity to meet the challenges for the intriguing "Million Digital Texts Project.") I make no bones I have strong feelings based on the bigger picture as I see it -- and I honestly believe my vision is even bigger than Michael's. I don't believe the ad-hoc, everyone does it their own way approach for producing etexts is sufficient any more to accomplish this Big Vision, and in fact will work against the Big Vision. Greg no doubts disagrees with me as FAQ0/1/3 outlines, but so be it -- history will be the ultimate arbiter of our differing world views. I see how inadequate the current PG collection is for the future. This evaluation is based upon three different ventures I've been involved with since 1999 (including one now in development) where this Big Vision has been, and is now being researched, by some really sharp technical people who are nailing down the many architectural and technical requirements. There are many more subtle requirements than one would at first imagine -- I'm only now beginning to understand them in a holistic sense -- and they reflect themselves all the way back to the fundamental structure of the texts themselves, and the associated metadata/catalog information. I see millions of high-quality, uniform digital texts, both public domain and Creative Commons, in a single repository which allows people to access them, annotate them, and link them together and with other texts and with other types of multimedia content in other repositories in very powerful ways that would take too long to describe here. That's one reason I state the master texts must be in well-structured XML, since that will enable the advanced features this repository will have. Properly done XML also confers many other benefits too numerous to mention here. Both DP and PG have blessed the right XML approach (e.g., as exemplified by Marcellos PGTEI), which is very encouraging. But there's more. For reasons I won't go into here (again for brevity sake), this Big Vision also sets slightly more stringent requirements on both metadata and cataloging than is currently done in PG, and it's the spinning wheels of the current discussion on metadata and cataloging that lead to my posts this afternoon out of sheer frustration. I see no *requirements* mentioned, and no vision as to *what* the metadata/ catalog information is to be used for. How can one fix the metadata requirements without a discussion of what the metadata will be used, and useful for? It is frustrating to see all this ad-hoc activity happening with no guidance as to the who, what, when, where, why and to what extent -- the purpose of the metadata -- being resolved based on general requirements, which in turn are derived from the full and detailed vision (which is NOT given in the FAQs) of why PG exists and what it produces. Certainly I could try to force my way further into the discussion (more than I have now) and try to provide answers to these questions, but then I'll just become another voice to add the ad-hoc cacophony we now have where the one who produces something first wins, even if it ends up not meeting the full long-term goals. This is the result of the FAQ0 and FAQ1 philosophy, which does not always give the results one hopes for. To get resolution on tough issues it is oftentimes necessary for the leadership to take charge and to firmly guide discussion to logically resolve what must be done. In some ways, it may be that the "leadership" simply doesn't have the time (because it is voluntary) to formalize the process to force a structured approach to fast decision-making and buy-in to the result. Understandable, but sad. What I fear the most, and this I've expressed to Brewster Kahle (who I meet again next week about Project Gramophone) and to JD Lasica (who's launching the ourmedia project and I'm assisting with the metadata/ cataloging side) is that many people will develop these wonderful repositories of digital content (I'm also working on Project Gramophone/Sound Preserve to transfer and archive millions of old sound recordings), with billions of digital objects, which simply won't and can't "talk" with each other, because everyone is "doing their own thing" PG-style. Wheeee, the late 60's all over again. <smile/> Let me give a small example to illustrate just a corner of what the world could be like if everything is done properly: Imagine someone creating a video for ourmedia where someone is playing the piano, say "Take the 'A' Train", composed by Billy Strayhorn and which became Duke Ellington's theme song. We would want to be able to allow the viewer to link, if they so choose, with the song lyric repository, with various wikipedia entries, and to Sound Preserve to bring up orchestral recordings of "Take The 'A' Train" by Duke Ellington and others. We'd also like to link to the Project Gutenberg collection for any works, such as Duke Ellington's book "Music is My Mistress" (assuming PG got permission to add it, likely not.) And of course we'd allow the end-user to join special communities built around any particular topic connected with that song -- just as Ellington communities, jazz communities, Strayhorn communities, etc. Doing all of this (and a lot more) confers a few added requirements, especially with regards to metadata information (text has the redeeming grace that it is fairly easy to dig out some information by full text searching -- but not standardized subject matter fields! -- but it is much harder with video and audio so the metadata and cataloging requirements for video and audio will likely be more stringent and extensive.) PG's self-enforced isolation, because of its seeming fear of working with the Big Boys (which is somewhat understandable) is working against PG in various ways in seeing the bigger picture of how the text production activities it is catalyzing will mesh with this much bigger, more wonderful world. But if the various repositories don't do it right from the start, including Project Gutenberg, and they end up with millions and billions of digital objects *not done right*, then the interlinkage will be much more difficult and nowhere near as powerful and useful as it could be. It will be essentially impossible to fix after the fact. JD Lasica now recognizes this and is supporting somewhat expanded metadata standards to assure inter-repository linkage, but I don't see the PG "leadership" seeing this, nor am I confident it can because of the FAQ0/1/3 constraints. Note how PG is having difficulty fixing the metadata and catalog info for a *measly* 10,000 or so texts. Imagine having a million of them *not done right* (especially with regards to metadata and catalog information requiring human input -- for some digital objects, if the data is not collected right at the start, it will be impossible to figure it out much later, even with human intervention. So much for the power of our digital future.) (Part of the Big Vision calls for aiding integration using James Linden's very interesting "Open Genesis" concept, currently under development. James is probably not yet ready to discuss this, but it is best described as the "Semantic Web Done Right From the Start." The requirements Open Genesis confers upon digital content repositories are surprisingly quite minimal -- but it is needed to have a standardized framework to improve inter-repository and inter-object linking. Marcello's effort to bring RDF into the mix is laudable and will certainly aid more robust intra- and inter- repository linking.) I'd love to see PG take the lead to make this happen for the text side of the house, and that's my motivation in pressing a lot of issues here to the point where I may become personna non grata, but it won't happen until PG realizes that it needs to confer more requirements on the texts and metadata it catalyzes and collects from the many volunteers (outside of DP, which is doing things mostly right by my reckoning), as well as to more actively work with other repositories -- to become a part of the bigger world rather than isolating itself as it seems to. It needs one or two full-time people -- this costs some $$$ -- this requires a somewhat higher level of organization and a maybe a slightly different governance to even be given this $$$ (or to develop some ongoing revenue stream.) And if it wants to play a major role in the "Million Digitized Texts Project" (should it get successfully launched), it *has* to change its governance and how it interacts with the world at large. Frankly, the FAQ0 and FAQ1 documents are actually quite hostile by inferring the world at large is somehow evil and out to get PG. Yes, some parts of the world at large are hostile to PG and wish it gone, but not all of them. The wisdom is to associate with your friends and those who share the same vision, not drive them away by painting everyone with the same "evil" brush. If you don't believe FAQ0 and FAQ1 sends this message to those in various outside groups, I suggest the wording of FAQ0/1 be looked at again by what it doesn't say but should say. For example, there's little in there about building, for example, close strategic partnerships with other like-minded organizations, and to work together on common standards and common goals. Nothing there is mentioned about joining standards and other types of organizations so as to promote PG's interests. PG has become disturbingly quite xenophobic in orientation -- it acts as if the rest of the world does not exist or does exist and is evil, and that magic will always automatically happen if you simply let everyone do their own thing. Magic does happen often, but magic can also run out. To answer Greg's "I don't take Yes For an Answer" (which is, interestingly enough, what the New York Times William Safire today used to describe Arafat's 1999 refusal of unbelievable concessions by the Israelis), let me say that I am working hard on the vision. I'm coordinating with ourmedia, with Project Gramophone (now called Sound Preserve), and working with another venture dedicated to tying this all together and to launch the "Million Digitized Texts Project." Will we succeed in at least launching MDTP? Maybe. Maybe not. But I am taking Greg's "Yes for an answer" to heart and I am working on it as I envision it -- it's just that it is not restricted to the closed world of PG so that's why it seems somewhat out of lockstep with what is going on here. But if we do succeed in launching MDTP and the Bigger Vision it will be a part of, and if PG wants to play a *major* role with MDTP -- and I'd certainly welcome PG and its "leader volunteers" to jump onboard for many obvious reasons -- PG will have to change in certain ways simply to work as a major player with the MDTP project. If PG decides it rather not change its governance and focus by increasing the acceptable text and metadata standards (which really are not that much), then that's totally understandable -- PG could still play a role, but it would essentially be peripheral and the parade may end up marching by it. ***** On another point, if I expressed wording reflecting hostility to those who have contributed texts to the PG collection over the years, this was not my intent, and I apologize for this. I've typed in whole books by hand, and then laboriously proofed them, marked them up, and converted into ebooks, so I am familiar firsthand with this process of love. Some of the books being talked about here -- the very difficult 17th/18th century texts -- is a remarkable achievement to digitize (and markup as well.) It amazes me the commitment many people here have to digitizing texts. My comments were directed at the leadership for not following what I believe are slightly more stringent policies with regards to metadata and text formatting requirements (some of which are understandable given where things were in the early 1990's). I'm a firm believer in the principle of "the buck stops here". That is, if there are problems, it is the responsibility of the PG leadership due to their prior decisions and established system. It may be unfair at times since it is impossible to accurately predict the future and to develop the right approach to meet that future (e.g., Michael Hart's early allergy to including source information in texts appeared to be a protection mechanism against copyright infringement claims.) But nevertheless it is up to the leadership to take responsibility, adjust accordingly, and to pro-actively "fix it". Maybe some of the problems are best solved by the ad-hoc, hands-off approach as given in FAQ0/1/3, but I don't believe all problems with the PG Collection will be solved by this approach, especially when looking at the useful linkage of the PG collection with other content repositories as outlined above, which requires an integrated approach, and working cooperatively with other groups. ***** On a point related to what I wrote earlier, I'm troubled by this view that PG's collection should be focused toward a particular use niche, rather than to be designed to be useful for just about every use. As I've analyzed things, the added requirements to make PG digital texts useful not only for general reading, but for scholarship and research (plus linking to other repositories) are so few that to ignore them is downright puzzling. What is needed? Well, require the source info be included in the metadata -- that's the major one. The next one is to work hard to acquire and preserve page scans. There is likely a few other requirements which are even less burdensome. The vast majority of the effort to produce digital texts from paper copy is to scan (or type in) the book and then proofreading it. The rest of the added stuff to make the texts more useful is time- and effort-wise miniscule by comparison. This reminds me of a Minnesota-Norwegian joke about the Norwegian who tried to swim across a lake -- when he got 95% of the way to the other side, he decided he couldn't make it, and swam back. It's ludicrous not to make that extra 5% effort, and elevate the PG collection to a significantly higher plane of usefulness, quality, and better digital integrity (talked about next). This is especially tragic given the hundreds of thousands of hours already devoted to the PG collection, when that extra 5% (if that) would have made a significant improvement. ***** And about digital integrity, I stick to my position that anything which PG requires to increase the digital integrity of the text itself to the original source is a Good Thing (tm). Certainly deviations from the source must be allowed, such as correcting some obvious typesetting errors (as an aside, has PG established a uniform policy for what types of edits/corrections in the digital text are allowed? Or is this again one of those FAQ0/1 "let's not interfere with anyone", type of things?) But what I mean by digital integrity has to do with the faithfulness, or more importantly, the perception of faithfulness, of the *meaning* of the text to the original source. It's a legitimate question to ask whether those involved in producing digital texts took more liberties with the text than they should have? This is not a trivial issue when we look at history where censorship is the norm. Certainly, as Greg pointed out, the source texts themselves may have been grossly edited contrary to the author's original intent (if it were not the first edition, for example), but we must not add to this problem in any way (instead, let's also do the first edition!) In addition, I believe one intent of PG is to assist with the effort to assure the digital texts will survive into the distant future, to hopefully survive wars, revolutions, totalitarianism, digital "book burnings", etc. As the centuries roll by, the issue of digital integrity becomes more and more important for the integrity of the information being passed on to future generations. That is why I believe it is necessary for PG to establish policies for new texts, and to begin working on upgrading some of the existing texts at the appropriate time, to standardize the digital integrity requirements as much as possible, and more importantly to acquire and preserve the original page scans whenever possible. Having the original page scans available side-by-side with the digital texts also benefits everyone (and the Big Vision) by resolving any difficulties in presentation of the digital texts (we all know how weird some texts are), and for fighting against claims of copyright infringement. Contrary to Michael Hart's early policies in hiding the pedigree of digital texts, having the page scans available, so long as our copyright clearance procedure is sufficent, actually strengthens PG against claims of copyright infringement. ***** As a final note, I do agree with several who responded today about my call for redoing the older PG texts, saying we should wait until DP moves to the next-generation XML-based system before redoing these texts. I definitely agree as I think about it. What I think could be done, however, is to prepare for this eventuality by 1) flagging those texts we'd like to redo someday, 2) search for higher-quality source books which will give us *unencumbered* page scans, and then 3) file those page scans away in the archive for later conversion to digital text at the appropriate time. There's nothing wrong with decoupling the scanning stage from the proofreading stage. No doubt my answers will not satisfy everyone, and may not satisfy anyone. But after my spanking, I needed to reply, and in one case apologize. Jon Noring