
we're looking at rfrank's "roundless" experiment at fadedpage.com... as i said yesterday, this test is a very very very very very good thing, because distributed proofreaders has been bogged down in a morass of "rounds" for many years now. their standard workflow now calls for _three_ rounds of proofing, followed by _two_ rounds of formatting... throw in a "preprocessing" round, and their "postprocessing", which is following by "postprocessing verification", and you've got 8 rounds. i don't know about you, but to me, that seems like a lot... but that's not the worst of it. the worst is the resultant backlogs... the problem arises because d.p. has thousands of proofers doing p1 (the first round of proofing), but d.p. only has hundreds that do p2 (the second round), and mere _dozens_ doing p3 ("final" proofing)... needless to say, the large number of proofers doing p1 can proof more than the smaller number doing p2, or the tiny number in p3. the backlog created is (understandably) frustrating and demoralizing for the proofers trying to keep up in p2, and is killing the p3 proofers. there is also the gnawing feeling that not all pages _need_ 3 rounds. indeed, _most_ pages in _most_ books are simple enough that they can be finished in one round, two at the most. so the _inefficiency_ of the 3-round proofing is rather striking as well. the thought is that each page should be proofed only as many times as that page needs; this has been labeled as a "roundless" system. aside from the backlogs of partially-done material, the other sign of a problem with the dp.p. workflow is that production has flattened... even though d.p. enjoys a constant stream of incoming volunteers, thanks to all of the good-will that project gutenberg's free e-books have generated over the years, d.p. output has leveled out at under 250 books per month, which works out to less than 3,000 per year. against the backdrop of the _millions_ of books google has scanned, this is a mere drop in the bucket. a small drop in a very large bucket. rfrank doesn't go into all of this on his site. perhaps he didn't need to, since the d.p. people he's recruited are well-acquainted with the issues. but rfrank is also unclear on many of the details of his little experiment, which is a more worrying matter. specifically, i don't see a lot of experimental rigor here. it seems to me that roger is unfamiliar with the mechanics of the scientific method and its applicability to human social experiments. i see no evidence of any stated hypotheses, nor any way such hypotheses can be disconfirmed... the reason people developed the scientific method was because we found that when we just fooled around "to see how things turn out", we often ended up fooling ourselves about what we had seen, and what it meant. we learned that we had to actually specify our hypotheses, and devise tests (experiments) specifically designed to disconfirm our hypotheses. otherwise, our brains are only too willing to accommodate what we find as being "supportive" of our initial impressions. ("experimenter bias" is the term by which this insidious phenomenon is most well-known.) if i'm correct, this problem will surface in rfrank's future results, and surface repeatedly, so there's no need for me to labor the point now. but i wanted to frame this particular issue, here and now, in advance. that's enough for today. see you tomorrow... -bowerbird

...killing the p3 proofers.
The problem is worse: under the pressure to produce, and having become "jaded" the p3'ers apparently do not bother to even look at the digitized images of the author's text but rather assume that they know best and introduce changes which are other than what the author wrote. There is also the problem of "false positives" -- once the errors left in the text become infrequent-enough the human mind wants to make changes to "show you're making a positive contribution" even when there was no error there that the P3'ers ought to be fixing. But even the p3 problem is nothing compared to the wait time in post-processing, where things can get hung up for about literally another year. If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???

On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
... If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited. It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process. There is definitely flexibility. -- Greg

That's real good news, Greg, especially if you're talking about flexibility on the DP side. 100% of the responsibility for evaluating and recommending changes to the DP process has been apparently relegated to the DP Board of Directors. Since you are one of the five directors, you're in the know if anyone is. Since you represent 20% of the horsepower responsible for coming up with those changes, I trust you've been busy. On Tue, Feb 2, 2010 at 5:44 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
... If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
There is definitely flexibility.
-- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Tue, Feb 02, 2010 at 06:00:48PM -0800, don kretz wrote:
That's real good news, Greg, especially if you're talking about flexibility on the DP side. 100% of the responsibility for evaluating and recommending changes to the DP process has been apparently relegated to the DP Board of Directors.
I don't think that was the intention of the (relatively) new Board and new GM. The Board has ideas, but isn't trying to manage day to day activity.
Since you are one of the five directors, you're in the know if anyone is. Since you represent 20% of the horsepower responsible for coming up with those changes, I trust you've been busy.
Indeed, but actually we have not been looking at this level of detail for changes in the DP processing chain. The Board isn't to micromange, and isn't to get in the way of progress. That said, if you think there are proposals, ideas for change, etc. that are not getting the attention they deserve, I would be happy to bring them to the board (or GM, as appropriate) on anyone's behalf, anonymously if desired. -- Greg
On Tue, Feb 2, 2010 at 5:44 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
... If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
There is definitely flexibility.
-- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

And on the other end we're hearing the same thing - the GM is there only to manage, and initiative for change will come from the Board. I'm absolutely not suggesting the Board is or should be micro or macro managing. I think everyone is expecting that the Board is about Planning. You're not? You disagree? On Tue, Feb 2, 2010 at 6:20 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 06:00:48PM -0800, don kretz wrote:
That's real good news, Greg, especially if you're talking about flexibility on the DP side. 100% of the responsibility for evaluating and recommending changes to the DP process has been apparently relegated to the DP Board of Directors.
I don't think that was the intention of the (relatively) new Board and new GM. The Board has ideas, but isn't trying to manage day to day activity.
Since you are one of the five directors, you're in the know if anyone is. Since you represent 20% of the horsepower responsible for coming up with those changes, I trust you've been busy.
Indeed, but actually we have not been looking at this level of detail for changes in the DP processing chain. The Board isn't to micromange, and isn't to get in the way of progress.
That said, if you think there are proposals, ideas for change, etc. that are not getting the attention they deserve, I would be happy to bring them to the board (or GM, as appropriate) on anyone's behalf, anonymously if desired.
-- Greg
On Tue, Feb 2, 2010 at 5:44 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
... If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
There is definitely flexibility.
-- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Tue, Feb 02, 2010 at 09:43:07PM -0800, don kretz wrote:
And on the other end we're hearing the same thing - the GM is there only to manage, and initiative for change will come from the Board. I'm absolutely not suggesting the Board is or should be micro or macro managing. I think everyone is expecting that the Board is about Planning. You're not? You disagree?
Planning is exactly right. (Sorry for not responding sooner) -- Greg
On Tue, Feb 2, 2010 at 6:20 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 06:00:48PM -0800, don kretz wrote:
That's real good news, Greg, especially if you're talking about flexibility on the DP side. 100% of the responsibility for evaluating and recommending changes to the DP process has been apparently relegated to the DP Board of Directors.
I don't think that was the intention of the (relatively) new Board and new GM. The Board has ideas, but isn't trying to manage day to day activity.
Since you are one of the five directors, you're in the know if anyone is. Since you represent 20% of the horsepower responsible for coming up with those changes, I trust you've been busy.
Indeed, but actually we have not been looking at this level of detail for changes in the DP processing chain. The Board isn't to micromange, and isn't to get in the way of progress.
That said, if you think there are proposals, ideas for change, etc. that are not getting the attention they deserve, I would be happy to bring them to the board (or GM, as appropriate) on anyone's behalf, anonymously if desired.
-- Greg
On Tue, Feb 2, 2010 at 5:44 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
... If PG were able to easily accept a txt file now and the html version (and other versions later) not only would readers get some books a year earlier, but we could probably save some efforts that die and get lost somewhere between txt complete and html complete. Why does posting have to happen "all at once" ???
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
There is definitely flexibility.
-- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Greg Newby <gbnewby@pglaf.org> writes:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
I'm all for it. In the DP forum, I proposed this several times.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
Could we give this a try with manually selected books first? How can we make sure that we do not waste the whitewashers' time? -- Karl Eichwalder

While we are at it, could we consider a revision of the requirements for the PG txt files? Allowing a bit more of flexibility (for example, allow to preserve the original line and page breaks) and possibly with the availability of the page images will improve considerably the maintenance of the files and the addition of new versions. Carlo

On Wed, Feb 03, 2010 at 08:01:40AM +0100, Karl Eichwalder wrote:
Greg Newby <gbnewby@pglaf.org> writes:
On Tue, Feb 02, 2010 at 05:33:01PM -0800, Jim Adcock wrote:
It doesn't. In fact, "extracting" works from DP earlier was a big push I made a couple of years ago. At that time, such two stage (or other great-than-one stage) output was something that didn't fit well with the workflow. Maybe that's something that could be revisited.
I'm all for it. In the DP forum, I proposed this several times.
It's important to not double the effort involved at the final posting phase (whitewashing) through such a two stage process. But there are several good ways of insuring this, which could be incorporated with the process.
Could we give this a try with manually selected books first? How can we make sure that we do not waste the whitewashers' time?
Definitely. On a trial basis, the extra (or different) workload isn't such a big concern...we don't need to streamline while we're trying to experiment.
From the ww'er side, all you really need is a note with the upload that mentions "HTML will be forthcoming later," and then reference the .txt eBook # when the HTML is finally uploaded.
From the DP side, it seems that all this takes is an early extraction of formatted, proofread text, prior to going to HTML.
I'm sure it's somewhat more complicated than that, due to various cascading effects and perhaps some hard-coded policy on workflow, but I hope we all could accommodate some minor upheaval in the interest of exploration. -- Greg

i see no evidence of any
stated hypotheses, nor any way such hypotheses can be disconfirmed...
the reason people developed the scientific method was because we found
that when we just fooled around "to see how things turn out", we often
ended up fooling ourselves about what we had seen, and what it meant.
Not quite familiar with modern advances in sciences, I recon. Now-a-days it seems we're supposed to look at systems as a whole, instead of doing hypothesis drivin experiments (at least, granting agencies seem to think so). Frank
participants (7)
-
Bowerbird@aol.com
-
don kretz
-
Greg Newby
-
Jim Adcock
-
Karl Eichwalder
-
traverso@posso.dm.unipi.it
-
van Drogen Frank