Many solo projects out there in gutvol-d land?

I've done a few books for PG. I've used DP -- back in the day, but mostly I've been doing solo projects. I don't hear a lot about folks doing projects solo these days. Are there many of us out there? ============================================================ Gardner Buchanan <gbuchana@teksavvy.com> Ottawa, ON FreeBSD: Where you want to go. Today.

I suspect there are. You just don't see a lot of communication between them. I'm often checking newly posted texts for the catalog records, and I do notice credits sometimes that do not mention dp. I do projects on my own sometimes. I know Al Haines does many. I recall seeing a few religious texts lately from an individual contributor. --Andrew On Sat, 13 Feb 2010, Gardner Buchanan wrote:
I've done a few books for PG. I've used DP -- back in the day, but mostly I've been doing solo projects. I don't hear a lot about folks doing projects solo these days. Are there many of us out there?

I've been doing some on my own from around 2006. Before that also some DP. But decided I wanted to work on books I chose myself... I use different sources. The Archive, Gallica, and bookshops ofcourse.Must be about 70 books about now. Used to team up with a friend for proofreading ... Starting up a little website to get more partners... Marc 2010/2/13 Andrew Sly <sly@victoria.tc.ca>
I suspect there are. You just don't see a lot of communication between them.
I'm often checking newly posted texts for the catalog records, and I do notice credits sometimes that do not mention dp.
I do projects on my own sometimes. I know Al Haines does many. I recall seeing a few religious texts lately from an individual contributor.
--Andrew
On Sat, 13 Feb 2010, Gardner Buchanan wrote:
I've done a few books for PG. I've used DP -- back in the day, but mostly I've been doing solo projects. I don't hear a lot about folks doing projects solo these days. Are there many of us out there?
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I've produced many books single-handed, from scan to post, for both PG-US and PG-Canada. As a Whitewasher, I've encountered maybe a dozen solo producers. Many of the first-timers I deal with aren't prepared for the work involved in producing a book, and abandon their projects. Very few become multi-project submitters. Abandoned projects are not lost. My practice is to wait a year, then decide if I want to do the book myself. If I do, and I can find a scanset, I get a clearance, and produce the book. Many of the early producers, who did books when etext numbers were less than about 5000, no longer produce. I can think of only a few who do. Gardner, you're one, and David Price and David Widger are others. I didn't start until the very early 10000's--my first book was #10750, released January 2004. Al ----- Original Message ----- From: "Gardner Buchanan" <gbuchana@teksavvy.com> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Saturday, February 13, 2010 10:51 AM Subject: [gutvol-d] Many solo projects out there in gutvol-d land?
I've done a few books for PG. I've used DP -- back in the day, but mostly I've been doing solo projects. I don't hear a lot about folks doing projects solo these days. Are there many of us out there?
============================================================ Gardner Buchanan <gbuchana@teksavvy.com> Ottawa, ON FreeBSD: Where you want to go. Today. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Interesting. I hadn't realize the two organizations were so closely interdependent. So effectively, PG's release volume is almost directly dependent on DP's posting volume. And whatever validation requirements PG might have don't have much relevance if they differ from DP's requirements, as long as the WWers don't reject them. DP is the publisher, and PG is the distributor (roughly speaking).

On Sat, 13 Feb 2010, don kretz wrote:
Interesting. I hadn't realize the two organizations were so closely interdependent.
Well, yes. There is a lot of interplay and adaption between the two. But I would not say that either is dependant upon the other for its existence. If PG were to somehow disappear or close down, I'm sure that DP would continue, finding another repository for its finished texts--or creating one if needed. And if DP were to disappear, PG would go on just as it always has, only with a much lower volume of texts being posted.
So effectively, PG's release volume is almost directly dependent on DP's posting volume.
The majority of new PG texts for many years have come frome DP, yes. For a quick comparison, I see that DP's 15,000th text was posted on May 12, 2009. They will have done many more since then, and have by now done more than half of the 31,000 odd items in PG. A while ago, I added this to the Wikipedia article on Project Gutenberg, to try to clarify what effect DP had had on it: "This effort greatly increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing." I could go on describing the hows and wherefores of that in more detail, but this is getting too long already.
And whatever validation requirements PG might have don't have much relevance if they differ from DP's requirements, as long as the WWers don't reject them.
Well, that has been part of the balancing act, if you will. PG has always adapted (albeit, sometimes slowly) according to its contributors. And DP contributors, after conversations back and forth, have helped to shape what direction PG is going in. One example that comes to mind is dropping the requirement that a text be of a certain length, in order to accomodate all the sci-fi short stories. In my own opinion, this can be difficult, becuase there are many parts that make up this process of DP-PG. Sometimes people make suggestions that seem good from their point of view, but very few seem to have an accurate over-all picture, to know how one action can affect other parts of the process.
DP is the publisher, and PG is the distributor (roughly speaking).
I don't know if that metaphor fits perfectly. Project Gutenberg itself seems to fill more of the publishers role, as well as distributor and archiver. DP does what might be compared to the traditional roles of type-setter, proofreader, fact-checker, etc. And don't underestimate the role of the post-processor. It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG. --Andrew

Not to worry - the last thing any of us do is undervalue the post-processor. The job just seems to become more complex, and the amount of value-add the provide beyond what the rest of us do keeps increasing. I don't think anyone is particularly happy about that, least of all the PPers.They're the smallest piece of pipe everything has to fit through, and they aren't getting much help in the way of tool support. On Sat, Feb 13, 2010 at 11:20 PM, Andrew Sly <sly@victoria.tc.ca> wrote:
And don't underestimate the role of the post-processor. It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG.
--Andrew _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Andrew Sly <sly@victoria.tc.ca> writes:
And don't underestimate the role of the post-processor. It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG.
I think we can change this. It would be much better to do this mysterious PP'ing in a collaborative manner. To experience this, I created an SVN repository and started with TEI tagging. I'll add more of the PGTEI framework soon: http://code.google.com/p/tieck-texts/ ATM, there is just one book and one contributor. More to come--thus far I did not announce it widely. pgdp seems to be down right now... -- Karl Eichwalder

Karl Eichwalder <ke@gnu.franken.de> wrote:
I did not announce it widely. pgdp seems to be down right now...
The server is up, the network is down. Unfortunately, our colocation provider is one of many in the NJ/NYC region that has been affected by fiber cuts related to the underground transformer explosion in NYC. Both upstream providers are working at this time to put temporary solutions in place to restore connectivity to these facilities until permanent repairs can be made. We did just obtain an ETA of "a couple more hours" from them via our coloc contact, but that would appear at the moment to be a somewhat optimistic educated guess. Hopefully service will be restored by this evening (Sunday US EST). David (donovan)

On 2/14/10 8:20 AM, Andrew Sly wrote:
And don't underestimate the role of the post-processor. It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG.
What is it they actually do? Regards, Walter

On Sun, Feb 14, 2010 at 12:25 PM, Walter van Holst < walter.van.holst@xs4all.nl> wrote:
On 2/14/10 8:20 AM, Andrew Sly wrote:
And don't underestimate the role of the post-processor.
It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG.
What is it they actually do?
Regards,
Walter
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
That's a simple question with a complicated answer. Here is an explanation <http://www.pgdp.net/c/faq/post_proof.php>that is apparently as concise as anyone has been able to come up with. As you can see, it's some mixture of: a.) validating all the work done by about 6 Rounds of work on each page in the project; b.) running a bunch of other semi-manual checks on the project; c.) filling the gap caused by the fact that the text markup and layout produced by the Rounds isn't the same as the text format and layout required by PG; d.) producing a complete HTML version of the project based on the format and markup that was originally considered appropriate for the text-only version that was all that PG offered at the time it was designed. So you can see that it's by far the majority of the individual tasks required to produce an e-book (text and html), only a small few of which have been distributed. In some cases the PPer also reproofs the entire project.

Hi All, Let me see if I understand this right. 6 Rounds of work is done just to be worked over so that text and HTML versions can be created and the final result is published. Why, in goods name is it not done the other way around !! Get a clean text and HTML version and then and all the googly goop after words. Sure would save alot of time. I know DP knows above markup, but have they ever heard about pseudo-code/markup. regards Keith. Am 14.02.2010 um 21:47 schrieb don kretz:
On Sun, Feb 14, 2010 at 12:25 PM, Walter van Holst <walter.van.holst@xs4all.nl> wrote: On 2/14/10 8:20 AM, Andrew Sly wrote:
And don't underestimate the role of the post-processor. It still comes down to one person who has to do a lot of work on the text, and often make descisions about how to deal with many various things, before it is ready for submitting to PG.
What is it they actually do?
That's a simple question with a complicated answer.
Here is an explanation that is apparently as concise as anyone has been able to come up with.
As you can see, it's some mixture of:
a.) validating all the work done by about 6 Rounds of work on each page in the project; b.) running a bunch of other semi-manual checks on the project; c.) filling the gap caused by the fact that the text markup and layout produced by the Rounds isn't the same as the text format and layout required by PG; d.) producing a complete HTML version of the project based on the format and markup that was originally considered appropriate for the text-only version that was all that PG offered at the time it was designed.
So you can see that it's by far the majority of the individual tasks required to produce an e-book (text and html), only a small few of which have been distributed.
In some cases the PPer also reproofs the entire project.

On Mon, Feb 15, 2010 at 1:00 AM, Keith J. Schultz <schultzk@uni-trier.de>wrote:
Hi All,
Let me see if I understand this right.
6 Rounds of work is done just to be worked over so that text and HTML versions can be created and the final result is published.
Why, in goods name is it not done the other way around !! Get a clean text and HTML version and then and all the googly goop after words. Sure would save alot of time.
I know DP knows above markup, but have they ever heard about pseudo-code/markup.
regards Keith.
Am 14.02.2010 um 21:47 schrieb don kretz:
You'd think it would be obvious, wouldn't you? When DP started, here was the basic process as far as the participants were concerned. 1.) A person takes a page of text and a picture of the text, plus a mediocre online text editor and some guidelines for follow, and tries to get the text to match the picture. 2.) A second person takes their work and the same picture and guidelines, and tries to make it better. 3.) The system strings the text files together and hands them off to PG to publish. Clean, simple, and most importantly it provides each person with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture. Now, almost all the process has been so completely decomposed and constrained that almost all the oppportunity for gratification shows up for a little bit to the first proofer (who still must not do *too much* to make it look like the picture, i.e. format it); maybe the first formatter (if there's even much left to do), and supremely and finally, gloriously, the Post Processor (whose name is associated semi-eternally posted with their work.) There's a whole lot more that can be said (is is said, in the DP forums, loudly, into the vastness of space), about how it got to be this way, and how happy people are about it, and what might be done. These are not dumb people, even though the work seems to have become dumb work. But there's the picture in a nutshell.

On Mon, 15 Feb 2010, don kretz wrote:
When DP started, here was the basic process as far as the participants were concerned.
1.) A person takes a page of text and a picture of the text, plus a mediocre online text editor and some guidelines for follow, and tries to get the text to match the picture.
2.) A second person takes their work and the same picture and guidelines, and tries to make it better.
3.) The system strings the text files together and hands them off to PG to publish.
Are you sure you have phrased that in the way you wanted? At no point in the history of DP was the output of the rounds "strung together and handed directly off to PG". I cannot recall if the name of "post-processor" has always been used--but there has always been someone in that role. Anyone who has worked on PP would know that the output on the rounds at DP is _not_ ready to be posted as a finished text without a good deal more work. But this is ok--this is as intended. The purpose of DP (as I understand it) has always been to distribute much of the work, and make things easier for the person preparing the text for submission to PG. To put this in context, let's compare with pre-DP times, when everything was done on an individual basis. An easy text that has come through DP can be prepared and submitted in one day; a more difficult one can take a week or two; a really hard one might take months working on it on and off. Now take those same texts without the DP preparation, where an individual starts working himself from the ocr output. The easy text could take perhaps three to six weeks; the more difficult one five to eight months or longer; and the hardest texts that have been done through DP could never have been attempted by an individual. One other very significant aspect is that DP has been set up to encourage a sense of community. And you have ready access to people with specialized knowledge about many languages, musical notation, obscure unicode characters, obselete typesetting conventions, etc. In the time before DP it was quite common for somone to put much effort into working on a text, and the burn out and abandon the project. Having DP gives many people a chance to do their bit, and have a much more manageable learning curve. --Andrew

"Andrew" == Andrew Sly <sly@victoria.tc.ca> writes:
Andrew> Are you sure you have phrased that in the way you wanted? Andrew> At no point in the history of DP was the output of the Andrew> rounds "strung together and handed directly off to PG". I Andrew> cannot recall if the name of "post-processor" has always Andrew> been used--but there has always been someone in that role. When I started at DP, in 2002, the work needed to pass from the R2 output to posting to PG was officially estimated in 30 minutes, without any specialized tool. I think that "strung together and handed directly off to PG" is a correct metaphor for 30 minutes of work. Enough to remove the separators, reflow the line ends, and that was all. No formatting (italics converted to uppercase for ship names), accents removed, no spell-checking, no gutcheck. This was a task of the project manager, and handing the task to somebody else was exceptional. Of course, even then, it took to me much longer to complete a book, since I used to re-read the book to catch a bunch of remaining errors. Carlo

On Mon, 15 Feb 2010, Carlo Traverso wrote:
"Andrew" == Andrew Sly <sly@victoria.tc.ca> writes:
Andrew> At no point in the history of DP was the output of the Andrew> rounds "strung together and handed directly off to PG". I Andrew> cannot recall if the name of "post-processor" has always Andrew> been used--but there has always been someone in that role.
When I started at DP, in 2002, the work needed to pass from the R2 output to posting to PG was officially estimated in 30 minutes, without any specialized tool. I think that "strung together and handed directly off to PG" is a correct metaphor for 30 minutes of work. Enough to remove the separators, reflow the line ends, and that was all. No formatting (italics converted to uppercase for ship names), accents removed, no spell-checking, no gutcheck. This was a task of the project manager, and handing the task to somebody else was exceptional.
Thanks Carlo. Perhaps my memory has become hazy in the intervening years. :) But still I questions your list. Why accents removed? It was fairly routine to post latin-1 texts at that time. (I can find an "8-bit" text as #1595, with a release date if Jan, 1999.) The earliest reference to gutcheck that I can find in my old emails is on Tue, 23 Jul 2002, but I don't think it was in common use yet. It was actually something that Jim T. had written as an evaluation tool for submitted texts.
Of course, even then, it took to me much longer to complete a book, since I used to re-read the book to catch a bunch of remaining errors.
I did the same with the project I ran through DP at that time as well. Perhaps that's why I assumed it was the norm. --Andrew

"Andrew" == Andrew Sly <sly@victoria.tc.ca> writes:
Andrew> On Mon, 15 Feb 2010, Carlo Traverso wrote: Andrew> Thanks Carlo. Perhaps my memory has become hazy in the Andrew> intervening years. :) Andrew> But still I questions your list. Why accents removed? It Andrew> was fairly routine to post latin-1 texts at that time. (I Andrew> can find an "8-bit" text as #1595, with a release date if Andrew> Jan, 1999.) These were the DP guidelines, (copied from PG official guidelines), I remember Ultima Thule, a book on iceland, with a discussion on what to do of the eths in names (that were eventually replaced with th) while the accents were routinely dropped. The book eventually was redone from scratch, it might have been the last one before DP changed officially to preserving accents. Carlo

A lot of what's been talked about here can, in my truly humble opinion be brought back to a question like --at this particular moment, what IS the internet and the diverse communities sociologically? I think a DP community and the way of working evolves facebook and twitter-wise ... which is why Ilike to work on my own a lot... 2010/2/15 don kretz <dakretz@gmail.com>
On Mon, Feb 15, 2010 at 1:00 AM, Keith J. Schultz <schultzk@uni-trier.de>wrote:
Hi All,
Let me see if I understand this right.
6 Rounds of work is done just to be worked over so that text and HTML versions can be created and the final result is published.
Why, in goods name is it not done the other way around !! Get a clean text and HTML version and then and all the googly goop after words. Sure would save alot of time.
I know DP knows above markup, but have they ever heard about pseudo-code/markup.
regards Keith.
Am 14.02.2010 um 21:47 schrieb don kretz:
You'd think it would be obvious, wouldn't you?
When DP started, here was the basic process as far as the participants were concerned.
1.) A person takes a page of text and a picture of the text, plus a mediocre online text editor and some guidelines for follow, and tries to get the text to match the picture.
2.) A second person takes their work and the same picture and guidelines, and tries to make it better.
3.) The system strings the text files together and hands them off to PG to publish.
Clean, simple, and most importantly it provides each person with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture.
Now, almost all the process has been so completely decomposed and constrained that almost all the oppportunity for gratification shows up for a little bit to the first proofer (who still must not do *too much* to make it look like the picture, i.e. format it); maybe the first formatter (if there's even much left to do), and supremely and finally, gloriously, the Post Processor (whose name is associated semi-eternally posted with their work.)
There's a whole lot more that can be said (is is said, in the DP forums, loudly, into the vastness of space), about how it got to be this way, and how happy people are about it, and what might be done. These are not dumb people, even though the work seems to have become dumb work. But there's the picture in a nutshell.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Mon, Feb 15, 2010 at 7:43 AM, don kretz <dakretz@gmail.com> wrote: Re the two-round system:
Clean, simple, and most importantly it provides each person with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture.
Yes, and it often produced godawful results. If the R2 proofrer was sloppy, a sloppy text went to the PPer. Some PPers exhausted themselves reproofing the text to fix the mistakes that R2 had left. Others just processed the text and sent it off to PG, warts and all. One R2 proofer had proofed an astonishing number of pages ... but he did so by smoothreading them hurriedly, without checking against the image. He missed many errors. PPers complained. Readers of PG texts complained. The current workflow at DP is a *reaction* to the previous lack of quality control. That's why P3ers have to pass a test. That's why proofing and formatting were separated. OK, our quality control is strangling us. I don't think the answer is to go back to the good old days of two rounds and error-ridden texts. -- Karen Lofstrom

yes, trying again but constantly moderated out ... + sevenry books in a couple of years ... Hope to get some information through one time 2010/2/15 Karen Lofstrom <klofstrom@gmail.com>
On Mon, Feb 15, 2010 at 7:43 AM, don kretz <dakretz@gmail.com> wrote:
Re the two-round system:
Clean, simple, and most importantly it provides each person with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture.
Yes, and it often produced godawful results. If the R2 proofrer was sloppy, a sloppy text went to the PPer. Some PPers exhausted themselves reproofing the text to fix the mistakes that R2 had left. Others just processed the text and sent it off to PG, warts and all.
One R2 proofer had proofed an astonishing number of pages ... but he did so by smoothreading them hurriedly, without checking against the image. He missed many errors.
PPers complained. Readers of PG texts complained. The current workflow at DP is a *reaction* to the previous lack of quality control. That's why P3ers have to pass a test. That's why proofing and formatting were separated. OK, our quality control is strangling us. I don't think the answer is to go back to the good old days of two rounds and error-ridden texts.
-- Karen Lofstrom _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Nor is anyone suggesting going back. I was describing the progression and how it has affected the relationship between the users and the work.

There have at each step been a number of alternatives for dealing with quality issues. We (or someone, it was hardly "we") made choices which had consequences. One of the consequences was improved quality. Another was a change in the user's work experience (always a greater constraint, notice, seldom if ever improved user tools.) We are where we are. We can I suppose say it was done the best way possible, and what we have is the inevitable cost of the improvements." I think that's a difficult position to defend. Which is exactly what roger is, intentionally or not, making quite clear. We can't recast the decisions made in the past, but we need to do a better job of learning from them and dong better. Sooner would be nicer than later. Hence rfrank's project.

On Mon, 15 Feb 2010 13:26:09 -0800, don kretz <dakretz@gmail.com> wrote:
or not, making quite clear. We can't recast the decisions made in the past, but we need to do a better job of learning from them and dong better. Sooner would be nicer than later. Hence rfrank's project.
In that vein, how flexible is the DP software? I've been wondering to what extent parallel P1 rounds might be helpful. I find P2 proofing exceedingly boring because of the small number of errors that are left to be fixed in texts that are well-scanned and well-proofed in P1. I can't imagine how mind-numbing P3 will be if I ever become eligible for that 'status'. I can imagine that only having to look at the differences between redundant P1 proofed texts might be helpful since it would take two independent P1 proofers to overlook the same error to have it slip through. Another potential improvement might be to make texts available to the next round on a per page basis instead of having to wait for all pages to be finished in the previous round. Aforementioned suggestions may be silly, feel free to point out their silliness. Regards, Walter

"Walter" == Walter van Holst <walter.van.holst@xs4all.nl> writes:
Walter> On Mon, 15 Feb 2010 13:26:09 -0800, don kretz Walter> <dakretz@gmail.com> wrote: >> or not, making quite clear. We can't recast the decisions made >> in the past, but we need to do a better job of learning from >> them and dong better. Sooner would be nicer than later. Hence >> rfrank's project. Walter> In that vein, how flexible is the DP software? I've been Walter> wondering to what extent parallel P1 rounds might be Walter> helpful. I find P2 proofing exceedingly boring because of Walter> the small number of errors that are left to be fixed in Walter> texts that are well-scanned and well-proofed in P1. I Walter> can't imagine how mind-numbing P3 will be if I ever become Walter> eligible for that 'status'. I can imagine that only having Walter> to look at the differences between redundant P1 proofed Walter> texts might be helpful since it would take two independent Walter> P1 proofers to overlook the same error to have it slip Walter> through. This would be simple enough, just allowing a PM to load a set of txt files and a dummy proofer name in one of the projects columns. The administrators (having DB access) do this if asked, I suppose with a script (I have one in the test site). Another improvement would be to allow a PM to skip a round; this too is reserved to the few, overloaded administrators, but it is just changing a flag at one point in the code. Walter> Another potential improvement might be to make texts Walter> available to the next round on a per page basis instead of Walter> having to wait for all pages to be finished in the Walter> previous round. This might be trickier, since the whole philosophy of DP code is based on rounds and per-round permissions. It would require at least to start a new test DP site in which new changes in the code are made and extensively experimented in a live environment. The current test site is used for testing features that are potentially disruptive, and is inadequate for live testing: it is for alpha testing, a beta testing site would be necessary, or probably more than one. rfrank's test site at fadepage has abandoned the round philosophy, but is not derived from DP code, it is reimplemented from scratch. Walter> Aforementioned suggestions may be silly, feel free to Walter> point out their silliness. Not silly at all; I believe that the main problem of DP is its rigidity, the "one size fits all" philosophy, that is partly in the code, but mostly in the procedures, and is necessary in a huge structure. Smaller DP sites like DP-EU and DP-CAN have shown a more flexible structure, so I believe that a confederation of different DP sites, sharing a common aim and a common codebase, but different local laws and software configurations, and a loose coordination, would be a better model. Carlo

I'm biting my tongue, Carlo. The difficulties aren't primarily with the code, which can be (and on occasion has been) amended to overcome those types of problems. However, none of our volunteers has considered it appropriate or within the scope of their skills or interests to, for instance, document it; so it's pretty closely held within a small group. On Tue, Feb 16, 2010 at 2:30 AM, Carlo Traverso <traverso@posso.dm.unipi.it>wrote:
"Walter" == Walter van Holst <walter.van.holst@xs4all.nl> writes:
Walter> On Mon, 15 Feb 2010 13:26:09 -0800, don kretz Walter> <dakretz@gmail.com> wrote:
or not, making quite clear. We can't recast the decisions made in the past, but we need to do a better job of learning from them and dong better. Sooner would be nicer than later. Hence rfrank's project.
Walter> In that vein, how flexible is the DP software? I've been Walter> wondering to what extent parallel P1 rounds might be Walter> helpful. I find P2 proofing exceedingly boring because of Walter> the small number of errors that are left to be fixed in Walter> texts that are well-scanned and well-proofed in P1. I Walter> can't imagine how mind-numbing P3 will be if I ever become Walter> eligible for that 'status'. I can imagine that only having Walter> to look at the differences between redundant P1 proofed Walter> texts might be helpful since it would take two independent Walter> P1 proofers to overlook the same error to have it slip Walter> through.
This would be simple enough, just allowing a PM to load a set of txt files and a dummy proofer name in one of the projects columns. The administrators (having DB access) do this if asked, I suppose with a script (I have one in the test site). Another improvement would be to allow a PM to skip a round; this too is reserved to the few, overloaded administrators, but it is just changing a flag at one point in the code.
Walter> Another potential improvement might be to make texts Walter> available to the next round on a per page basis instead of Walter> having to wait for all pages to be finished in the Walter> previous round.
This might be trickier, since the whole philosophy of DP code is based on rounds and per-round permissions. It would require at least to start a new test DP site in which new changes in the code are made and extensively experimented in a live environment. The current test site is used for testing features that are potentially disruptive, and is inadequate for live testing: it is for alpha testing, a beta testing site would be necessary, or probably more than one.
rfrank's test site at fadepage has abandoned the round philosophy, but is not derived from DP code, it is reimplemented from scratch.
Walter> Aforementioned suggestions may be silly, feel free to Walter> point out their silliness.
Not silly at all; I believe that the main problem of DP is its rigidity, the "one size fits all" philosophy, that is partly in the code, but mostly in the procedures, and is necessary in a huge structure. Smaller DP sites like DP-EU and DP-CAN have shown a more flexible structure, so I believe that a confederation of different DP sites, sharing a common aim and a common codebase, but different local laws and software configurations, and a loose coordination, would be a better model.
Carlo _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

As a Whitewasher who's dealt with old DP productions as well as new ones, over the last couple of years, I second (and third and fourth) everything Karen says. Others may hold DP's current system to be inefficient/slow/etc,, but it does one thing that makes it all worth while--it can produce error-free texts. Example: I'm currently dealing with an errata report for an old DP production. I haven't looked into the problem in detail yet, but from what I've seen, at least several pages are missing, followed by a repeat of material that precedes the missing material. I'm going to have to go through the problem area of the posted text, compare it to a scanset, figure out which material is missing/redundant, OCR and proof whatever's missing, knit it into the text, then run Gutcheck/Jeebies/Gutspell on the repaired text, which will undoubtedly unearth a raft of other errors, all followed by a reformat and a repost. Also undoubtedly, many other errors will remain. Is it worth it? Personally speaking, no. It's going to take hours to fix this text, time that I'd far rather spend on my own productions, but there's currently no mechanism except for the Whitewashers, a.k.a. Errata Team, to fix this kind of thing. (Probably simpler to just re-do this text from scratch, which is something *I'm* not about to do.) In short, DP's current processes produce error-free texts; its old processes, from what I've seen of the results, didn't. Al ----- Original Message ----- From: "Karen Lofstrom" <klofstrom@gmail.com> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Monday, February 15, 2010 12:47 PM Subject: [gutvol-d] Re: Many solo projects out there in gutvol-d land?
On Mon, Feb 15, 2010 at 7:43 AM, don kretz <dakretz@gmail.com> wrote:
Re the two-round system:
Clean, simple, and most importantly it provides each person with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture.
Yes, and it often produced godawful results. If the R2 proofrer was sloppy, a sloppy text went to the PPer. Some PPers exhausted themselves reproofing the text to fix the mistakes that R2 had left. Others just processed the text and sent it off to PG, warts and all.
One R2 proofer had proofed an astonishing number of pages ... but he did so by smoothreading them hurriedly, without checking against the image. He missed many errors.
PPers complained. Readers of PG texts complained. The current workflow at DP is a *reaction* to the previous lack of quality control. That's why P3ers have to pass a test. That's why proofing and formatting were separated. OK, our quality control is strangling us. I don't think the answer is to go back to the good old days of two rounds and error-ridden texts.
-- Karen Lofstrom _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I can't think of anyone I know who would argue otherwise. That's not an issue that's open for discussion, I don't think. On Mon, Feb 15, 2010 at 2:03 PM, Al Haines (shaw) <ajhaines@shaw.ca> wrote:
As a Whitewasher who's dealt with old DP productions as well as new ones, over the last couple of years, I second (and third and fourth) everything Karen says.
Others may hold DP's current system to be inefficient/slow/etc,, but it does one thing that makes it all worth while--it can produce error-free texts.
Example: I'm currently dealing with an errata report for an old DP production. I haven't looked into the problem in detail yet, but from what I've seen, at least several pages are missing, followed by a repeat of material that precedes the missing material. I'm going to have to go through the problem area of the posted text, compare it to a scanset, figure out which material is missing/redundant, OCR and proof whatever's missing, knit it into the text, then run Gutcheck/Jeebies/Gutspell on the repaired text, which will undoubtedly unearth a raft of other errors, all followed by a reformat and a repost. Also undoubtedly, many other errors will remain.
Is it worth it? Personally speaking, no. It's going to take hours to fix this text, time that I'd far rather spend on my own productions, but there's currently no mechanism except for the Whitewashers, a.k.a. Errata Team, to fix this kind of thing. (Probably simpler to just re-do this text from scratch, which is something *I'm* not about to do.)
In short, DP's current processes produce error-free texts; its old processes, from what I've seen of the results, didn't.
Al
----- Original Message ----- From: "Karen Lofstrom" <klofstrom@gmail.com> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Monday, February 15, 2010 12:47 PM Subject: [gutvol-d] Re: Many solo projects out there in gutvol-d land?
On Mon, Feb 15, 2010 at 7:43 AM, don kretz <dakretz@gmail.com> wrote:
Re the two-round system:
Clean, simple, and most importantly it provides each person
with the immediate and obvous positive gratification of seeing their work self-evidently closing the gap between the text and the picture.
Yes, and it often produced godawful results. If the R2 proofrer was sloppy, a sloppy text went to the PPer. Some PPers exhausted themselves reproofing the text to fix the mistakes that R2 had left. Others just processed the text and sent it off to PG, warts and all.
One R2 proofer had proofed an astonishing number of pages ... but he did so by smoothreading them hurriedly, without checking against the image. He missed many errors.
PPers complained. Readers of PG texts complained. The current workflow at DP is a *reaction* to the previous lack of quality control. That's why P3ers have to pass a test. That's why proofing and formatting were separated. OK, our quality control is strangling us. I don't think the answer is to go back to the good old days of two rounds and error-ridden texts.
-- Karen Lofstrom _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

.... but there's currently no mechanism except for the Whitewashers, a.k.a. Errata Team, to fix this kind of thing. (Probably simpler to just re-do this text from scratch, which is something *I'm* not about to do.)
OK, HOW ABOUT a mechanism for fixing and/or improving things that were done in the past that now look old and crufty by today's standards? -- whether redoing something originally created by DP or by a solo? Certainly WW shouldn't be the only way to fix old cruft. If someone wants to take on a "redo and improve" what does it take? Many of the things that actually get read at PG are pretty old and crufty! -- I haven't been willing to take on any of the Ye Olde Cruft for fear of pushback.

Any "mechanism" is informal, at best, and there's no list of old submissions that would benefit from being re-done. To use as an example, Arizona Sketches, by J. A. Munk, PG#756. Internet Archive has a number of source copies. In 2008, I cleaned up PG's text file, made corrections, and created an HTML version. It's missing all illustrations, any Latin1 characters, and so forth. If the only intent is to correct a current PG etext, the corrected text and HTML files can be sent to PG's Errata system. Do not reformat the files, so that the corrected ones can be compared to the posted ones. It might take a few days for the WWers to deal with such submissions, but they *will* be dealt with. However, if you want to add illustrations, or any other material that may be missing from the posted files, you'll have to submit a copyright clearance for the source edition, do whatever is needed to add the missing material to the posted files, do a thorough check/correction of those files from the source, then upload everything as normal, mentioning in the Note to Whitewashers field that the submission is intended as an update to an existing etext. The WWers will decide whether to post the new submission as a new etext, or to replace (and archive) the existing files. If the latter is chosen, the original submitter's credit will be added to the new version's Credit line. ----- Original Message ----- From: "Jim Adcock" <jimad@msn.com> To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org> Sent: Saturday, February 20, 2010 3:48 PM Subject: [gutvol-d] Re: Many solo projects out there in gutvol-d land?
.... but there's currently no mechanism except for the Whitewashers, a.k.a. Errata Team, to fix this kind of thing. (Probably simpler to just re-do this text from scratch, which is something *I'm* not about to do.)
OK, HOW ABOUT a mechanism for fixing and/or improving things that were done in the past that now look old and crufty by today's standards? -- whether redoing something originally created by DP or by a solo? Certainly WW shouldn't be the only way to fix old cruft. If someone wants to take on a "redo and improve" what does it take? Many of the things that actually get read at PG are pretty old and crufty! -- I haven't been willing to take on any of the Ye Olde Cruft for fear of pushback.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Let's just forget the whole idea of error free texts. . . . Ever since I started Project Gutenberg I've never seen even one book I read, even most articles and essays, without big bluders you would think could never be published. I would prefer just to get these materials in circulation-- then worry about approaching perfection along with Xeno. Does anybody have a serious objection to putting the 8,000, or so, books that were listed earlier as being in limbo, in something like our "PrePrints" section, where we put eBooks that are admittedly not ready for prime time??? Please. . . . Michael On Sat, 20 Feb 2010, Jim Adcock wrote:
In short, DP's current processes produce error-free texts....
I will disagree with this, at least given that DP's current processes introduce punc errors pretty much by design.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Sat, 20 Feb 2010, Michael S. Hart wrote:
Does anybody have a serious objection to putting the 8,000, or so, books that were listed earlier as being in limbo, in something like our "PrePrints" section, where we put eBooks that are admittedly not ready for prime time???
Yea, there are people arguing that it's a horrible thing to do. I'm 100% with you on this. Available with a few errors is far more useful than unavailable. And it's not that they aren't actually available now, they are. DP has always had the concatenated text available for download. It's behind a sign on and not indexed by any of the search engines, so if you don't know it's there already you can't find it. -- Greg Weeks http://durendal.org:8080/greg/

Sorry, I have been out and had email problems etc... I strongly urge you to follow up this line of thought. There are several sites on the internet doing fine work by making valuable material available, much of which is either full of scanning errors or even in scanned form. Is it satisfactory? Certainly not. Is it worth making available against the time that someone else improves it, if ever? MOST certainly. Is it consonant with our dignity to prefer making perfection available? Certainly. Is it consonant with our dignity to sit on material in case bairns and fools think that the job should do itself? Think about it. Make it available first, and let anyone dissatisfied get busy and make it satisfactory. Cheers, Jon
Let's just forget the whole idea of error free texts. . . .
Ever since I started Project Gutenberg I've never seen even one book I read, even most articles and essays, without big bluders you would think could never be published.
I would prefer just to get these materials in circulation-- then worry about approaching perfection along with Xeno.
Does anybody have a serious objection to putting the 8,000, or so, books that were listed earlier as being in limbo, in something like our "PrePrints" section, where we put eBooks that are admittedly not ready for prime time???
Please. . . .

I do "solos" given my frustration level with DP -- where I've submitted two really good books but none have made it back out of the system. IMHO setting up a book to go through the DP system aka Content Providing isn't a whole lot less work than just doing the whole book for myself in the first place. Not entirely happy working with myself either -- going it alone is a bit of slog for me -- but my tolerance level for wasting time is about one month -- which is about how long it takes me to make a book while working around various family emergencies -- as compared to 40 months for DP. And with DP nothing happens for months or years at a time -- and then the people there are unhappy with you if you happen to be out of town if and when your book pops off a queue and "goes active". What I wish is that DP had a "Fast Trackers" division of people interested in and committed to turning books out quickly, so that one could see a project from beginning to end. I still proof at DP occasionally when I have excess energy -- but not enough to start my own new book project again!

On Sat, 20 Feb 2010, Jim Adcock wrote:
What I wish is that DP had a "Fast Trackers" division of people interested in and committed to turning books out quickly, so that one could see a project from beginning to end.
Don (dkretz) and I and a small team experimented with this a few weeks ago. It's entirely possible to do this within the current DP constraints. I think we took about two weeks for the short we used. That wasn't the main purpose of the experiment, but the short period of time was one of the constraints for what we want to test. -- Greg Weeks http://durendal.org:8080/greg/
participants (15)
-
Al Haines (shaw)
-
Andrew Sly
-
D Garcia
-
don kretz
-
Gardner Buchanan
-
Greg Weeks
-
Jim Adcock
-
Jon Richfield
-
Karen Lofstrom
-
Karl Eichwalder
-
Keith J. Schultz
-
Marc D'Hooghe
-
Michael S. Hart
-
traverso@posso.dm.unipi.it
-
Walter van Holst