Status of Mothership and DP-EU (and even DP-CAN)
For those who don't know, the term 'Mothership' was coined by DP-EU as a friendly appelation of the original site, which should, logically and honestly, but isn't (out of a marvellous and praiseworthy reticence) be called DP-US. However, the other sites (DP-EU and potential others) acknowledge our debt to the primary site. 'Mothership' is cool enough. Bearing in mind that I am not a policy maker, nor have I access to anything other than public information, I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and workflow could be further optimized. The old system was, basically, broken down into Content Provision, two rounds of Proofing (including formatting), Post Processing (basically sewing all the parts back together and ensuring consistancy, and Post-Process Verification. The new system has changed from the two Proofing rounds to two Proofing rounds (P1/P2) and two Formatting rounds (F1/F2). And additional (though optional) Whitewashing has also been implemented. I believe that virtually everything is being generated in both ASCII and HTML. Ability to perform certain tasks is no longer assumed. For example, you must proof 200 pages of an acceptable standard, and be on site for at least four months, before you may proof P2. Results to date have been mixed. While no one doubts that the long term will make things better (apparently the overall quality has improved an order of magnitude): however, production is down; and there are both current and predicted bottlenecks which will take months to resolve. All of this has led to a certain -brittleness? - on the site, with all the questions being raised, and with, alas! some people taking offense at things that were certainly not meant to do so. DP is still, without doubt, the friendliest site on the web, and matters will be resolved. Finally, all in process work had to be redone. DP-EU has not yet reached a critical mass. A great dealt of this is due, I think, to the tremendous challenges faced in dealing with so many language groups. (Do you know the site has even been localized in Urdhu?) So far, there are not enough proofers in many different languages. I think, for example, there is only one Bulgarian proofer. Who can P2 his work? I'm very concerned with mentoring at DP-INT, and try to do it at DP-EU, too: but we don't have beginner queues there- and I'm basically monolingual. We really need Mentors in each language. (Mentoring is, IMHO, one of the main reasons for DP's success. Not only do we assure a level of quality, but we let newcomers know that we are a friendly mob, and that they are appreciated. /Very /important. Even today, much of the work at EU comes from INT volunteers. Nowt wrong with that. And, in turn, Team Canada members are an important part of INT. While we- me, at least- would dearly like to see DP-CAN up and lurching, it would obviously reduce resources currently active on the other two sites. I do have an agreement with TPL for publicity to get new proofers when we do launch, (and they are going to be providing some magnificent material), but let's make sure the two existing sites are not undermined. In the meanwhile, you can submit Life+50 material through DP-EU or, as some are doing, amass it till we do launch. Cheers, Michael Lockey (Vasa)
I might even do a few pages for DPCan, if you get it working. jivadas Michael Lockey <mlockey@honson.com> wrote: For those who don't know, the term 'Mothership' was coined by DP-EU as a friendly appelation of the original site, which should, logically and honestly, but isn't (out of a marvellous and praiseworthy reticence) be called DP-US. However, the other sites (DP-EU and potential others) acknowledge our debt to the primary site. 'Mothership' is cool enough. Bearing in mind that I am not a policy maker, nor have I access to anything other than public information, I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and workflow could be further optimized. The old system was, basically, broken down into Content Provision, two rounds of Proofing (including formatting), Post Processing (basically sewing all the parts back together and ensuring consistancy, and Post-Process Verification. The new system has changed from the two Proofing rounds to two Proofing rounds (P1/P2) and two Formatting rounds (F1/F2). And additional (though optional) Whitewashing has also been implemented. I believe that virtually everything is being generated in both ASCII and HTML. Ability to perform certain tasks is no longer assumed. For example, you must proof 200 pages of an acceptable standard, and be on site for at least four months, before you may proof P2. Results to date have been mixed. While no one doubts that the long term will make things better (apparently the overall quality has improved an order of magnitude): however, production is down; and there are both current and predicted bottlenecks which will take months to resolve. All of this has led to a certain -brittleness? - on the site, with all the questions being raised, and with, alas! some people taking offense at things that were certainly not meant to do so. DP is still, without doubt, the friendliest site on the web, and matters will be resolved. Finally, all in process work had to be redone. DP-EU has not yet reached a critical mass. A great dealt of this is due, I think, to the tremendous challenges faced in dealing with so many language groups. (Do you know the site has even been localized in Urdhu?) So far, there are not enough proofers in many different languages. I think, for example, there is only one Bulgarian proofer. Who can P2 his work? I'm very concerned with mentoring at DP-INT, and try to do it at DP-EU, too: but we don't have beginner queues there- and I'm basically monolingual. We really need Mentors in each language. (Mentoring is, IMHO, one of the main reasons for DP's success. Not only do we assure a level of quality, but we let newcomers know that we are a friendly mob, and that they are appreciated. Very important. Even today, much of the work at EU comes from INT volunteers. Nowt wrong with that. And, in turn, Team Canada members are an important part of INT. While we- me, at least- would dearly like to see DP-CAN up and lurching, it would obviously reduce resources currently active on the other two sites. I do have an agreement with TPL for publicity to get new proofers when we do launch, (and they are going to be providing some magnificent material), but let's make sure the two existing sites are not undermined. In the meanwhile, you can submit Life+50 material through DP-EU or, as some are doing, amass it till we do launch. Cheers, Michael Lockey (Vasa) _______________________________________________ Project Gutenberg of Canada Website: http://www.projectgutenberg.ca/ List: pgcanada@lists.pglaf.org Archives: http://lists.pglaf.org/private.cgi/pgcanada/ SUPPOSE THAT THERE WAS SOMEBODY OUT THERE TRYING TO TALK!!! Tune in with a Free Screensaver: http://setiathome.ssl.berkeley.edu/ A project of the U of Calfornia, listening in on outer space. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Thu, 14 Jul 2005, Michael Lockey wrote:
I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and
Interesting argument. One that I might disagree with on some levels. Some books, prepared by some people for PG have had a truly exceptional quality, without going through the DP process. But yes, overall the quality evens out to a higher level from DP. What I miss from the DP process is having a holistic view of the book in question. I find that if I am preparing a text carefully, I really get a feeling for its quirks, its style, its _personality_. This puts me in a better place to anticipate what errors are worth searching for, and what choices to make in ambiguous situations. It is my impression that not all PP's connect with a text on this same level. As the system is set up, it's easy to just check what other peopl have marked as possibly needing attention, run some automatic checks, and then forward the file. As I perceive it, PP'ing has been a bottle-neck in in DP, the same the equivilant in a smaller process because to actually _do it well_ takes a bit of creativity and a decent amount of effort, and is sometimes quite boring. Andrew
Andrew Sly wrote:
On Thu, 14 Jul 2005, Michael Lockey wrote:
I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and
Interesting argument. One that I might disagree with on some levels.
Some books, prepared by some people for PG have had a truly exceptional quality, without going through the DP process.
Please; I would just like to make it clear that much excellent work HAS been, and continues to be, published by individuals, and even works of lesser quality can be the equivalent of (so-called) published books. I don't think I've ever seen a book without errata, and some- including best sellers- can be appalling. It'll be interesting to see how Mr. Potter fares this time... No; I'm talking about the overall performance; and not making any comparisons. Our mandate must simply include continuous improvement. DP is, I assume, aware that folk with enthusiasm will be more involved with a work. Upcoming books are announced at several stages of the process so that anyone with interest can grab stuff that excites them, both for processing and whitewashing. (I have just grabbed /The Old Coast Road/, since I'm very interested in historic transportation.) By definition, proofers will, when possible, go for what they enjoy. But there's always /Copyright Renewals/, and /the Grammer of English Grammers/... Michael Lockey
Michael Lockey wrote:
Andrew Sly wrote:
I've been around long enough that I can honestly say the quality of submissions from DP is much higher than when individuals were doing all their own work, with no checks except ad hoc. Naithless, PSTB have determined that quality and
Interesting argument. One that I might disagree with on some levels.
Some books, prepared by some people for PG have had a truly exceptional quality, without going through the DP process.
Definitely. Some of the PG texts I've used for republishing (where I have to carefully go over the transcription quality) have been outstanding -- very low error rates and faithfulness to the source (which I borrow to compare the text to.) But then I've had the dog as well where the text is full of errors, and there's other troubling anomalies such as reworked paragraphs and so on. To add to this, NetWorker recently determined that the text of PG's 'Frankenstein' is pretty much identical to that of a modern edited edition issued by Bantam (ca. 1980 if I remember correctly), and "conveniently" there are no original page scans to verify this -- and there is no record of what was the source book, as was Michael Hart's policy up until recently. (His policy a few years ago was to specifically NOT allow detailed source information to be included with the texts, with certain exceptions granted.) However, what is not talked about much is that DP is a more open and public system. That is, DP is not relying on one scribe working alone, whose labor has no public oversight. Rather, the proofing at DP is done in the open, among a group of people who otherwise have no relationship with each other, and no vested interest in a particular outcome. The perception of trustworthiness and faithfulness this adds to the final work product cannot be underestimated, especially when markets such as education, public libraries, and scholars, wish to use the texts. In addition, DP is preserving the page scans, and in my private conversations with Juliet, will eventually make them public. This adds to the faithfulness aspect, and allows others to know for certain the source book, and have the ability to verify textual faithfulness. It is a show of "good will" and confidence the work was done right: "we stand behind our work and proudly provide the page scans as proof of the trustworthiness of our work" (of course, there are also tangible benefits, besides perception alone, to providing the page scans, but I'm focusing on the perception of trust aspect.) Most of the pre-DP texts do not have scans to back them up (many were typed in so no scans were made), and for most of these texts the source book is not even referenced. Note again what I mentioned above about 'Frankenstein' -- NetWorker analyzed only one book this way, and had a 100% hit -- one wonders how many other pre-DP PG texts were taken from modern edited editions and nobody knows about it except the transcriber? Will they ever talk to reveal their sources? Unlikely. I hope that PG Canada will embrace the DP system or something similar (where, in a public setting, multiple people are involved in proofing and checking each other's work), and that the page scans from the original source are made publicly available to proudly sit side-by-side the structured digital text versions. Let me state it clearly that what I say above is not intended to disparage those who do work alone -- most of them are diligent and committed to textual fidelity. They produce great work. But the problem is, how in 50-100 years does anyone know a particular text, done alone this way, is textually faithful to the original? One could say "John Scribe did this book, and you can trust his work". Well, the obvious question 50-100 years from now is "Who is Joe Scribe? Show me the original page scans, and was that book transcribed by a group of people working in public who checked each other's work?" One of the books I publish is the Kama Sutra of Vatsyayana, which was written ca. 3rd century CE in India. Unfortunately, only a few copies exist, and none of them are complete. In addition, comparing these copies with each other we find huge difference between them. Sir Richard Burton had to combine some source copies he had available to him, and using textual analysis, his legendary linguistic ability, and historical knowledge, try to discern what the original text must have been. It is obvious that over the centuries, as scribes copied the work, that the scribes introduced their own changes to the text (of course in some cases there had to be "modernization" of the language as the Hindi language evolved over the centuries.) Because there was no dedication to faithful preservation, we no longer have the original Kama Sutra. It is gone forever. Today we have a Public Domain, and it's the only one we have, and any group digitizing the Public Domain should commit themselves to textual fidelity, and setting up the process that will not only achieve this, but to show others that indeed textual fidelity to the source was achieved, and can be demonstrated. Technology now allows us to achieve this transparency.
Please; I would just like to make it clear that much excellent work HAS been, and continues to be, published by individuals,
Agreed! I've done some texts myself a few years ago for issuance as commercial books, having typed them in by hand, and proofed. I've slaved over these texts (both by eye and various search tools) and I know the error rate is *very low*. Yet, because I did them myself with no public oversight, and did not do any page scans, I will not release the texts to PG because they were not done using a trustworthy process, even though *I know* they are very faithful and done well. One of the texts, Burton's "Kama Sutra" (see above), I recently scanned at high quality, and the scans have been submitted to DP for proofing, which I hope will soon commence. The text that results *will* be much more trustworthy and worthy of public preservation than mine, even though I know mine is textually faithful. I hope this makes sense in that I practice what I preach, and will not submit digital texts of Public Domain works without taking steps to assure trustworthiness (and since DP is there, that's where I presently will submit them.) I trust my work, but you should NOT trust my work. And if you tell me you cannot trust my work by *how* I did it, I will not be offended, and neither should those who also transcribed texts alone. If they are offended, I don't believe they yet understand the larger issues involved. It is understandable they may not want to understand, after having put in hundreds and even thousands of hours of hard labor into their lone proofing activities.
and even works of lesser quality can be the equivalent of (so-called) published books. I don't think I've ever seen a book without errata, and some- including best sellers- can be appalling. It'll be interesting to see how Mr. Potter fares this time... No; I'm talking about the overall performance; and not making any comparisons. Our mandate must simply include continuous improvement.
Definitely! Having the scans available to the public is an important component since it allows rapid determination if there is a transcription error. Michael Hart now wants to "correct" the errors found in many of the non-DP texts. Good luck. There's no source information recorded (so no source to consult with), and no scans to consult. How can one correct errors when one doesn't have the source to correct to? Some errors can be corrected, but others cannot. What about "Huck Finn" where misspellings are the norm? And what about broken up and run together paragraphs and even sentences? (Not to mention missing sentences and paragraphs due to transcribing errors -- how will these be found without painstaking comparison to a printed edition?) And what about "Americanized" British spelling? The list goes on and on. I've stated many times that PG should simply redo the older texts through the DP system, and this time find an authoritative source copy, and get the page scans online. Problem solved. (Regarding "authoritative", let me give another example. Burton published another book contemporaneous to the "Kama Sutra", the "Perfumed Garden of Sheik Nefzaoui", published about 1885. This book has proven to be quite rare, and not found in many libraries. Around 1913 (or so), a pirated reissue was published. Because of the fear of prosecution or simple disgust, a whole chapter was removed, and a few paragraphs elsewhere removed. So the 1913 issue is a censored copy. Unfortunately, that censored pirated copy was widely sold, and is the dominant edition found in academic libraries today. Well, guess what edition was used for the online text versions of the "Perfumed Garden"? You got it right -- it is the censored, pirated edition. What is scary is that those who put this text online did not know this, and even after I told them about it, they didn't care -- they didn't even bother to put up a disclaimer or to say it comes from a later censored pirated reissue. And this is the text that is being spread in every corner of the Internet, on mirrored hard drives and backup media all around the world. Unless something is done, in 50-100 years when one searches for Burton's "Perfumed Garden", the censored edition will be the one brought up, and the end-user will not even know it is the censored edition -- they will take it at face value that it is the original. (This "face value" acceptance is what worries me about the PG collection.) I'm working to try to rectify this, but the lesson is that those who digitize public domain texts need to do a little study on each work being digitized, to assure the source(s) they use is/are reasonably authoritative, and of course to record the source metadata along with making the page scans public. There are lots of scholars out there who know bibliographic information about particular works who would be glad to advise. Do your homework, post on the Internet to ask for help, and learn a little about each work before going through the digitization process...)
DP is, I assume, aware that folk with enthusiasm will be more involved with a work. Upcoming books are announced at several stages of the process so that anyone with interest can grab stuff that excites them, both for processing and whitewashing. (I have just grabbed The Old Coast Road, since I'm very interested in historic transportation.) By definition, proofers will, when possible, go for what they enjoy.
Definitely! And for particular book topics, one can go outside to various forums and quickly recruit volunteers. There now seems to be forums for just about every subject and author out there. DP has not done this because it doesn't have to at this time (they are working at their limit), but I see PG Canada doing this for particular books. What about an old book on the history of Barrie? (a hypothetical example). Just contact the Barrie historical group(s) and other Barrie organizations, and voile', a bunch of people will show up at the door to help with basic and mundane proofing. Provide a convenient online interface to assist with the proofing (like DP does), and one will be able to find willing and enthusiastic volunteers for just about every book under the sun. Just my $0.02 worth. Jon
participants (4)
-
Andrew Sly
-
jiva das
-
Jon Noring
-
Michael Lockey