Pre-release proof of concept

As mentioned a few weeks ago, I encouraged people who were interested to set up a pre-release area for items in the Distributed Proofreaders processing chain. There is now a site that has this proof of concept: http://dp.readingroo.ms/ This was done by Greg Weeks, using entirely (or almost entirely) DP items he is the project manager for. But other folks who are project managers can ask to have their items included. He announced it here, and there is some follow-up discussion. You need a DP username to access this forum: http://www.pgdp.net/phpBB2/viewtopic.php?t=43367 The scripts are checked in at: http://dp50.googlecode.com/svn/trunk/pgdpprerelease/ It's uncertain whether this will continue as-is, or there will be other, competing ideas, or we'll not keep up with this, or something else. Feedback & ideas welcome. -- Greg

Thanks, Greg. Here's a copy of the posting I made in response in the DP forums. I'd be surprised if anyone can identify a single ebook produced elsewhere that can compare favorably with one of ours that has received the full-course treatment. Anyone suggesting that an inferior version would influence DP's commitment to completing an excellent one doesn't understand what we're about. Especially not a preview version of our own. I would expect it would be much more likely to encourage others to come join us. An example: the Encyclopedia Britannica project that was available on PG when I joined DP (and still is, by the way,) was unquestionably inferior to what we currently produce after just two Rounds of proofing; yet was still impressive enough to catch my interest, first helping proof/format, then as project manager for Encyclopedia Britannica. For comparison, here's a project submitted and released this week<http://www.gutenberg.org/dirs/3/1/7/9/31793> . Despite the fact that there are several inferior EB versions available on the internet, I have if anything noticed a recent upswing in the relative number of new people working on EB in the early rounds. I hope putting their work out for others to see will help attract more. I would also expect it might encourage some of those we have to progress into the further rounds, where work grinds quickly to a halt. It certainly would be an encouragement to provide earlier posting; based on current production rates, most of them wouldn't see their work released in their own lifetimes. - Don On Mon, Mar 29, 2010 at 12:10 PM, Greg Newby <gbnewby@pglaf.org> wrote:
As mentioned a few weeks ago, I encouraged people who were interested to set up a pre-release area for items in the Distributed Proofreaders processing chain. There is now a site that has this proof of concept:
This was done by Greg Weeks, using entirely (or almost entirely) DP items he is the project manager for. But other folks who are project managers can ask to have their items included.
He announced it here, and there is some follow-up discussion. You need a DP username to access this forum:
http://www.pgdp.net/phpBB2/viewtopic.php?t=43367
The scripts are checked in at:
http://dp50.googlecode.com/svn/trunk/pgdpprerelease/
It's uncertain whether this will continue as-is, or there will be other, competing ideas, or we'll not keep up with this, or something else. Feedback & ideas welcome.
-- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

From my point of view the point of doing a pre-release was noting that the books stuck on the DP queues are pretty darned good, that there are enough books stuck on the queues to represent about an addition 10,000 books to the PG corpus, that the books stuck on queues represent at any point in time about 1/3 of all the books ever started at DP, and that on average books get stuck on DP queues for about three years nowadays.
The idea was to get much of this "good work but not yet done" in front of PG readers, or readership in general, in order to reduce the amount of good volunteer effort lost by being stuck on the queue -- ie something like 1/3 of the all volunteer effort done EVER at any point in time is stuck on queue. The only way to get this "good work but not yet done" stuck in front of readers is to put it someplace they can find it and read it. The "obvious" place to put it would be in the PG database where it would show up with a "DANGER WILL ROBINSON" (or the PD version of such a message) in the database to the effect that this is a work in progress and is not at that state of completion that PG normally posts its work. The other way one might stick work in front of readers so that they actually can read it would be to have the work findable via google search, for example, ie roboted. Neither of these statements are true, so the DP reading room concept puts books in progress in a location that readers will not find those works in progress, at which point in time posting them to the reading room does not represent a contribution to the world, and the problem remains the same, namely: The books stuck on the DP queues are pretty darned good, that there are enough books stuck on the queues to represent about an addition 10,000 books to the PG corpus, that the books stuck on queues represent at any point in time about 1/3 of all the books ever started at DP, and that on average books get stuck on DP queues for an average of three years nowadays. Another way of saying this is that the other alternative is for readers to get the books from Google Books (assuming one can find them there) at which point in time the reader has a choice: a) get the Google PDF version of the bitmap photocopy of the book, which may or may not work depending on what one is using as a reader device, or b) get the EPUB or TXT from Google which is an uncorrected OCR of the book typically inferior in quality to that submitted to DP by content providers in the first place prior to round P1. IE even output from the P1 round of DP represents a significant contribution to many readers for whom the alternative is to try to read the unmotivated and uncorrected OCR output of Google -- but only if DP were to put that output in some place where real world readers can actually find that contribution and read it. Right now we still have the dogs guarding the straw.

On Wed, Mar 31, 2010 at 01:28:59PM -0700, James Adcock wrote:
From my point of view the point of doing a pre-release was noting that the books stuck on the DP queues are pretty darned good, that there are enough books stuck on the queues to represent about an addition 10,000 books to the PG corpus, that the books stuck on queues represent at any point in time about 1/3 of all the books ever started at DP, and that on average books get stuck on DP queues for about three years nowadays.
The idea was to get much of this "good work but not yet done" in front of PG readers, or readership in general, in order to reduce the amount of good volunteer effort lost by being stuck on the queue -- ie something like 1/3 of the all volunteer effort done EVER at any point in time is stuck on queue.
The only way to get this "good work but not yet done" stuck in front of readers is to put it someplace they can find it and read it. The "obvious" place to put it would be in the PG database where it would show up with a "DANGER WILL ROBINSON" (or the PD version of such a message) in the database to the effect that this is a work in progress and is not at that state of completion that PG normally posts its work.
This is viable, but more involved than an independent site.
The other way one might stick work in front of readers so that they actually can read it would be to have the work findable via google search, for example, ie roboted.
Neither of these statements are true, so the DP reading room concept puts books in progress in a location that readers will not find those works in progress, at which point in time posting them to the reading room does not represent a contribution to the world, and the problem remains the same, namely:
Are you saying that no goals are achieved because none but a few know about dp.readingroo.ms? That site is just the proof of concept. I don't see why we would work to get that indexed via Google etc. Once we have something that is ready, we will do publicity and set up links to make it findable. That will certainly address this next point:
The books stuck on the DP queues are pretty darned good, that there are enough books stuck on the queues to represent about an addition 10,000 books to the PG corpus, that the books stuck on queues represent at any point in time about 1/3 of all the books ever started at DP, and that on average books get stuck on DP queues for an average of three years nowadays.
Yes. More:
Another way of saying this is that the other alternative is for readers to get the books from Google Books (assuming one can find them there) at which point in time the reader has a choice: a) get the Google PDF version of the bitmap photocopy of the book, which may or may not work depending on what one is using as a reader device, or b) get the EPUB or TXT from Google which is an uncorrected OCR of the book typically inferior in quality to that submitted to DP by content providers in the first place prior to round P1. IE even output from the P1 round of DP represents a significant contribution to many readers for whom the alternative is to try to read the unmotivated and uncorrected OCR output of Google -- but only if DP were to put that output in some place where real world readers can actually find that contribution and read it.
I think this is just a restatement of what we've all been saying. (I'm not disagreeing.) For the proof of concept that Greg Weeks put together at http://dp.readingroo.ms, the approach was to allow individual project managers the decision as to whether "their" titles get extracted to dp.readingroo.ms. This seems to me like a good start, letting people "opt in" to this test. How to handle opting in, or opting out, in a "real" implementation, is a good topic for discussion, especially for the DP PMs. Where to host the content and how to make it more widely available, searchable, and usable is another good topic. -- Greg
Right now we still have the dogs guarding the straw.

On Wed, 31 Mar 2010, James Adcock wrote:
The other way one might stick work in front of readers so that they actually can read it would be to have the work findable via google search, for example, ie roboted.
There is no robots.txt, so if google finds it now, it will index it. I'm not sure if we decide to keep this, where we'll keep it yet. Till then, there's no point in making google find it. Yes, when it's permanent it needs to be findable via google and other search engines. -- Greg Weeks http://durendal.org:8080/greg/

There is no robots.txt, so if google finds it now, it will index it. I'm not sure if we decide to keep this, where we'll keep it yet. Till then, there's no point in making google find it. Yes, when it's permanent it needs to be findable via google and other search engines.
OK, I guess I've never had any luck googling anything on DP because its all pretty much stuck behind the login.
participants (4)
-
don kretz
-
Greg Newby
-
Greg Weeks
-
James Adcock