Re: the d.p. opinion on "prerelease" of e-texts

some people are quite agitated at the thought of the idea. they drag out the old rot about how "d.p. means _quality_". -bowerbird
DP does have a standard of quality similar to many current publishers of books. I frankly cannot recall the last time I read a book in any format or from any publisher/producer of books that did not contain errors of some sort or another. Humans are not perfect. Writing, editing, and proofing are human endeavors and, as such, are limited by our imperfections. The fact that they even strive for quality at DP is a mark in their favor. There are many eBook producers of public domain books that hold quantity so far above quality as to make a majority of their books almost unreadable. I would worry that a pre-release version of a text might result in PG getting a reputation for being yet another supplier of illegible eBooks (even if these books are only available in a 'special area'). As Michael asked only for 'serious objections' against a pre-release section being created, I cannot think of any objections that go outside of simply wondering if you are willing to have these books circulated as they stand (people may grab them and distribute them 'as-is'). Would they have a header that specifies that they are 'not ready for prime time' as well as being in a special area? (Many people strip the headers, so the source of origin issue often becomes a moot point.) The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it. Carel -------- Original Message -------- Subject: [gutvol-d] the d.p. opinion on "prerelease" of e-texts From: Bowerbird@aol.com Date: Tue, February 23, 2010 5:22 pm To: gutvol-d@lists.pglaf.org, bowerbird@aol.com just in case you were wondering, dkretz _did_ copy michael hart's saturday post over to d.p., giving michael's position in favor of prerelease of texts while they're still "in progress" at d.p. that was on saturday evening, and since then there have been over 100 posts in response:
there was an earlier discussion on the issue too:
the earlier discussion had over 100 posts as well, and had a poll, which got a high number of votes. the poll question?
Making preview texts available on a dedicated site, with a clear label that they're work in progress. Good idea/bad idea?
the poll results?
45% -- I'd rather you didn't 17% -- I couldn't care less 36% -- Great idea!
some people are quite agitated at the thought of the idea. they drag out the old rot about how "d.p. means _quality_". -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote: [snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system Good luck in convincing either DP or PG to adopt these kind of reforms.

I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!! We are prepared to start making directories for them and getting them loaded in as soon as we have permission. We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers. We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier. Many thanks!!! Michael On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page? I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products. Al ----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I think you're talking about a major technical challenge for the DP developers to undertake something like that, even if there weren't as much highly vocal opposition as there is. There hasn't even been a single response from the developers about assisting with a transfer to the PG site. Or from anyone else with significant authority there. On Wed, Feb 24, 2010 at 2:00 PM, Al Haines (shaw) <ajhaines@shaw.ca> wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the > format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 2/24/2010 3:00 PM, Al Haines (shaw) wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
True, if that were the proposal. But it wasn't. The proposal was to let the files churn at DP until they were in a "mostly finished" state. Then, let them sit at PG in a "mostly finished" state forever, because no one can say definitively when they /are/ finished; and to put them in some kind of a Wiki-like environment so "tweaks" can be made incrementally by the "unwashed masses". Or, PG could encourage DP to release its work product (including page scans) to organizations other than PG (e.g. IA, Wikisource) where these kind of incremental changes /could/ be made and PG could harvest them from these other sources at regular intervals.

Greg Newby takes care of deleting from PrePrints, should be easy enough just to send him notes of what has been completed. mh On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the > format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Hmm... I figured Greg was busy enough! <g> As for the "major technical challenge" suggested elsewhere in this topic, why can't DP put up a wiki page for pre-releases, similar to its Harvesting wiki page? Project Managers (or whoever) could put links on the page to their pre-release candidates, and when a pre-release was ready for submission to PG, or had been posted, the PM could remove the link. Al ----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 5:50 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
Greg Newby takes care of deleting from PrePrints, should be easy enough just to send him notes of what has been completed.
mh
On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then have people nitpick them to 'perfection' later. The point would be the > format of storage so that the rough version could be compared to the original scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a certain rate. It would then go into general distribution in a system that allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Hmm... I figured Greg was busy enough! <g>
I talked to Greg about it, he's fine with it. Might be best to let him do it once a week or something, tho.
As for the "major technical challenge" suggested elsewhere in this topic, why can't DP put up a wiki page for pre-releases, similar to its Harvesting wiki page? Project Managers (or whoever) could put links on the page to their pre-release candidates, and when a pre-release was ready for submission to PG, or had been posted, the PM could remove the link.
I'm up for any way we can get more eBooks to more people, sooner, than later. I'm willing to try them both, see how our readers respond. After all, it's not trivial to find the PrePrints page, though Google does seem to find it ok, which should be ok for most of our readers. Let's try. . . . Michael
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 5:50 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
Greg Newby takes care of deleting from PrePrints, should be easy enough just to send him notes of what has been completed.
mh
On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
The more I think about it, the more I feel that it would actually > be rather nice to see something along the lines of the 'roundless experiment' produce decent quality 'first release' books and then > have people nitpick them to 'perfection' later. The point would be the > format of storage so that the rough version could be compared to the > > >
original
scans and corrections/updates be done with ease. Also, if a book is backlogged at DP, it would make sense to have a method for someone outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what > > bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a > > certain rate. It would then go into general distribution in a system that > > allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of > > reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Some of these questions may overlap a bit--bear with me... Who's going to monitor this pre-prelease page for when projects on it are posted? The WWers? DP? Greg? Will it contain only pre-release text files, or all working files associated with a given project (page scans, illustrations, text, HTML, etc, etc)? If the latter, what's to stop someone from taking those files, getting their own clearance, and submitting them to PG as their own work? Or is DP going to consider that they're, in effect, abandoned projects, and up for grabs? (I can only imagine the reaction that would cause.) Related to a couple of the above questions, would the WWers be expected to check to see if a given submission is one that's also in progress at DP, or would it be a case of first-come, first-posted, and let DP take its lumps? Rather than dumping who knows how many pre-releases into Preprints, I'd suggest a separate Prerelease page. (Speaking personally, I regularly check Preprints for interesting/doable projects, and have drawn a number of projects from it. I doubt I'd be interested in looking through a raft of ex-DP items, searching for non-DP Preprint items.) Al ----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 8:10 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Hmm... I figured Greg was busy enough! <g>
I talked to Greg about it, he's fine with it. Might be best to let him do it once a week or something, tho.
As for the "major technical challenge" suggested elsewhere in this topic, why can't DP put up a wiki page for pre-releases, similar to its Harvesting wiki page? Project Managers (or whoever) could put links on the page to their pre-release candidates, and when a pre-release was ready for submission to PG, or had been posted, the PM could remove the link.
I'm up for any way we can get more eBooks to more people, sooner, than later.
I'm willing to try them both, see how our readers respond.
After all, it's not trivial to find the PrePrints page, though Google does seem to find it ok, which should be ok for most of our readers.
Let's try. . . .
Michael
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 5:50 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
Greg Newby takes care of deleting from PrePrints, should be easy enough just to send him notes of what has been completed.
mh
On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Would it not be simpler for DP itself to have a Pre-releases page, similar to its Smooth-Read page?
I would think that if pre-releases are copied from DP into some PG environment, similar to Preprints, there would need to be some coordination to remove them from that environment when they're posted into PG as finished products.
Al
----- Original Message ----- From: "Michael S. Hart" <hart@pglaf.org> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Wednesday, February 24, 2010 1:21 PM Subject: [gutvol-d] Re: the d.p. opinion on "prerelease" of e-texts
I can tell you that both Greg Newby, our CEO, and myself, are most definitely FOR doing such prereleases!!!
We are prepared to start making directories for them and getting them loaded in as soon as we have permission.
We would put up a number of disclaimers abou them to put a protective system up for DP and various volunteers.
We would appreciate any opportuty to do this with any of the three sections of books mentioned earlier.
Many thanks!!!
Michael
On Wed, 24 Feb 2010, Lee Passey wrote:
On 2/24/2010 12:37 PM, cmiske@ashzfall.com wrote:
[snip]
> The more I think about it, the more I feel that it would > actually > be > rather nice to see something along the lines of the 'roundless > experiment' produce decent quality 'first release' books and > then > have > people nitpick them to 'perfection' later. The point would be > the > format > of storage so that the rough version could be compared to the > > > >
original
> scans and corrections/updates be done with ease. Also, if a > book is > backlogged at DP, it would make sense to have a method for > someone > outside of DP to adopt the book and finish it.
I agree with you completely, and what you have described is what > > bowerbird has been agitating for for many years now. A book would go into 'first release' when the number of changes in the roundless system dropped below a > > certain rate. It would then go into general distribution in a system that > > allowed "continuous proofreading" in a wiki-like system
Good luck in convincing either DP or PG to adopt these kind of > > reforms. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

If the latter, what's to stop someone from taking those files, getting their own clearance, and submitting them to PG as their own work?
What's to stop a PM and Content Provider who is sick to death of having their hard work stuck in limbo year after year at DP from taking a SR copy, cleaning it up and submitting it to PG -- given that they are the holder of the CC in the first place and the person who did the lion's share of the work cleaning it up for submission to DP in the first place? Answer: Simple Integrity, and the desire to play fair with DP even when DP is not playing fair with that PM and Content Provider. What is NOT fair IMHO is when works that volunteers have put their blood sweat and tears into gets stuck forever at DP while apparently a commercial entity has taken the SR from DP and is selling it on Amazon. Work that volunteers put into the public domain should go there first, and THEN back to the commercial providers. But, this is what happens when you take years sitting on books instead of allowing them to be finished.

We have always invited people to take our completed books and redo them into their own editions, and hopefully resubmit them to redistribute. If we do this for books that are done, why not for those undone? Are we worried more about who gets the credit and getting books out? Michael On Thu, 25 Feb 2010, James Adcock wrote:
If the latter, what's to stop someone from taking those files, getting their own clearance, and submitting them to PG as their own work?
What's to stop a PM and Content Provider who is sick to death of having their hard work stuck in limbo year after year at DP from taking a SR copy, cleaning it up and submitting it to PG -- given that they are the holder of the CC in the first place and the person who did the lion's share of the work cleaning it up for submission to DP in the first place?
Answer: Simple Integrity, and the desire to play fair with DP even when DP is not playing fair with that PM and Content Provider. What is NOT fair IMHO is when works that volunteers have put their blood sweat and tears into gets stuck forever at DP while apparently a commercial entity has taken the SR from DP and is selling it on Amazon. Work that volunteers put into the public domain should go there first, and THEN back to the commercial providers.
But, this is what happens when you take years sitting on books instead of allowing them to be finished.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Thu, 25 Feb 2010, Michael S. Hart wrote:
Are we worried more about who gets the credit and getting books out?
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor. -- Greg Weeks http://durendal.org:8080/greg/

Greg Weeks <greg@durendal.org> writes:
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
If that's at least an option, why not? Nobody forces you to buy it, though. -- Karl Eichwalder

On DP I've called it the "Let Them Eat Cake" approach. Some people think if they can't wait for the best, then anything less shouldn't be available derived from their work. It's an option they shouldn't be permitted to have. On Thu, Feb 25, 2010 at 7:17 PM, Karl Eichwalder <ke@gnu.franken.de> wrote:
Greg Weeks <greg@durendal.org> writes:
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
If that's at least an option, why not? Nobody forces you to buy it, though.
-- Karl Eichwalder _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

It's not uncommon among programmers, either, I've found, apropo of nothing, present company excepted, etc. On Thu, Feb 25, 2010 at 7:23 PM, don kretz <dakretz@gmail.com> wrote:
On DP I've called it the "Let Them Eat Cake" approach. Some people think if they can't wait for the best, then anything less shouldn't be available derived from their work. It's an option they shouldn't be permitted to have.
On Thu, Feb 25, 2010 at 7:17 PM, Karl Eichwalder <ke@gnu.franken.de>wrote:
Greg Weeks <greg@durendal.org> writes:
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
If that's at least an option, why not? Nobody forces you to buy it, though.
-- Karl Eichwalder _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

don kretz wrote:
It's not uncommon among programmers, either, I've found, apropo of nothing, present company excepted, etc.
The Free Software community says: "Release early, release often." Its the proprietary software producers that let you wait forever and then release crap. Now, which of these is DP? -- Marcello Perathoner webmaster@gutenberg.org

As soon as a book is released in PG, a number of other sites copy this, make their own formats and release it. Now when PG pre-releases a book this process will repeated. The other sites will make their own format and release the book. When the final corrected book is posted to PG, very few sites would update their version. Our experience in correcting the earlier texts bears this out. Even when an updated and corrected version is posted, not many sites update their version. The old error ridden version continues. Also the older version comes up when you google for the book. For example Sense and Sensibility by Jane Austen is EText#161. This was corrected and a new illustrated version posted in 2007 (#21839). But when you google you get only Etext#161. Fortunately this has also been updated in January, 2009. Having done the illustrated version Solo, I know. So in effect PG will be releasing a number of error-ridden books. Again the side effects of this could be *Many active volunteers in D.P may lose interest. These are the proofers and formatters who do not get their name in the credit line. A drop in the active volunteers of DP is definitely not in the best interests of PG.* * * *IMHO, it is for DP to decide whether the pre-release version should be released taking into account the sensitivities of the volunteers. Forum discussions in D.P can not be taken as the representative opinion of D.P volunteers. * * * D.P does not release crap unlike some of the software producers (including open source) who release software full of bugs and what not.* * Sankar Viswanathan On Fri, Feb 26, 2010 at 12:32 PM, Marcello Perathoner < marcello@perathoner.de> wrote:
don kretz wrote:
It's not uncommon among programmers, either, I've found, apropo of
nothing, present company excepted, etc.
The Free Software community says: "Release early, release often."
Its the proprietary software producers that let you wait forever and then release crap.
Now, which of these is DP?
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Sankar Service to Humanity is Service to God

I don't know how you use Google, but your #21839 appears for me: Sense and Sensibility by Jane Austen - Project Gutenberg Jun 15, 2007 ... Download the free ebook: Sense and Sensibility by Jane Austen. www.gutenberg.org/etext/21839 - Cached - Similar The Project Gutenberg eBook of Sense & Sensibility, by Jane Austen End of the Project Gutenberg EBook of Sense and Sensibility, by Jane Austen ... www.gutenberg.org/files/21839/21839-h/21839-h.htm The Project Gutenberg EBook of Sense and Sensibility, by Jane ... Jan 18, 2009 ... Project Gutenberg is a registered trademark, and may not be ... www.gutenberg.org/files/21839/21839-8.txt Interestingly enough, our audio book comes up very high, as well. Now, in all fairness, I will be the first to admit that the order in which the various editions appears fluctuates from time to time as Google change policies affect the results. [See Microsoft suing Google in Europe, etc.] Today, for example, your edition shows up around #30 in all the results of "project gutenberg" "sense and sensibility by jane austen" which I did cut and paste from the exact words in your message. I have asked in the past for volunteers who know how to make higher ranked hits for us, but all in all, those hits all refer to us, though, as said: some of the hits, even the first one for your edition, comes from outside, not directly from Project Gutenberg. All you have to do to make YOUR edition move up the charts is all the same things millions of other sites do, and I have no objection to you trying a manipulation of this kind. If it works out, perhaps we should modify our headers and footers to quite literally pull ourselves up by our bootstraps. As for those who never, or rarely, update their Project Gutenberg eBooks-- you can say it makes us look better by comparison, and thus encourage more people to come directly to our sites. However, to keep doors closed that could be open is not the way to get the most eBooks to the most people. Michael On Fri, 26 Feb 2010, Sankar Viswanathan wrote:
As soon as a book is released in PG, a number of other sites copy this, make their own formats and release it.
Now when PG pre-releases a book this process will repeated. The other sites will make their own format and release the book.
When the final corrected book is posted to PG, very few sites would update their version.
Our experience in correcting the earlier texts bears this out. Even when an updated and corrected version is posted, not many sites update their version. The old error ridden version continues. Also the older version comes up when you google for the book.
For example Sense and Sensibility by Jane Austen is EText#161. This was corrected and a new illustrated version posted in 2007 (#21839). But when you google you get only Etext#161. Fortunately this has also been updated in January, 2009.
Having done the illustrated version Solo, I know.
So in effect PG will be releasing a number of error-ridden books.
Again the side effects of this could be
Many active volunteers in D.P may lose interest. These are the proofers and formatters who do not get their name in the credit line. A drop in the active volunteers of DP is definitely not in the best interests of PG.
IMHO, it is for DP to decide whether the pre-release version should be released taking into account the sensitivities of the volunteers. Forum discussions in D.P can not be taken as the representative opinion of D.P volunteers.
D.P does not release crap unlike some of the software producers (including open source) who release software full of bugs and what not.
Sankar Viswanathan
On Fri, Feb 26, 2010 at 12:32 PM, Marcello Perathoner <marcello@perathoner.de> wrote: don kretz wrote:
It's not uncommon among programmers, either, I've found, apropo of nothing, present company excepted, etc.
The Free Software community says: "Release early, release often."
Its the proprietary software producers that let you wait forever and then release crap.
Now, which of these is DP?
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Sankar
Service to Humanity is Service to God

It's been true in the past that this point of view has held the strongest position in shaping DP's policy and practices. If PG wants to publish "early and often", I think it's unlikely DP will be the partner of choice, unless there's someone willing to offer a stronger voice than Mr. Garcia and Mr. Viswanathan. There's been no evidence that such a voice exists. If that's really PG's goal, it probably should find another partner. Unfortunately. On Fri, Feb 26, 2010 at 1:21 AM, Sankar Viswanathan <sankarrukku@gmail.com>wrote:
As soon as a book is released in PG, a number of other sites copy this, make their own formats and release it.
Now when PG pre-releases a book this process will repeated. The other sites will make their own format and release the book.
When the final corrected book is posted to PG, very few sites would update their version. Our experience in correcting the earlier texts bears this out. Even when an updated and corrected version is posted, not many sites update their version. The old error ridden version continues. Also the older version comes up when you google for the book.
For example Sense and Sensibility by Jane Austen is EText#161. This was corrected and a new illustrated version posted in 2007 (#21839). But when you google you get only Etext#161. Fortunately this has also been updated in January, 2009.
Having done the illustrated version Solo, I know.
So in effect PG will be releasing a number of error-ridden books.
Again the side effects of this could be
*Many active volunteers in D.P may lose interest. These are the proofers and formatters who do not get their name in the credit line. A drop in the active volunteers of DP is definitely not in the best interests of PG.* * * *IMHO, it is for DP to decide whether the pre-release version should be released taking into account the sensitivities of the volunteers. Forum discussions in D.P can not be taken as the representative opinion of D.P volunteers. * * * D.P does not release crap unlike some of the software producers (including open source) who release software full of bugs and what not.* *
Sankar Viswanathan
On Fri, Feb 26, 2010 at 12:32 PM, Marcello Perathoner < marcello@perathoner.de> wrote:
don kretz wrote:
It's not uncommon among programmers, either, I've found, apropo of
nothing, present company excepted, etc.
The Free Software community says: "Release early, release often."
Its the proprietary software producers that let you wait forever and then release crap.
Now, which of these is DP?
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Sankar
Service to Humanity is Service to God
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Fri, 26 Feb 2010, Marcello Perathoner wrote:
The Free Software community says: "Release early, release often."
I was saying this long before they were. If you recall, Alice in Wonderland, our breakthrough eBook, appeared in 30 revised editions in just a few years leading our readers to the conclusion that they could come back any time and get revised versions of our eBooks. No one complained out in the real world, but eventually the insider complainers in PG decided there should be a final-- once and future--Alice in Wonderland. There were plenty of errors to go around in the early days, but it turned out that the biggest complainers were just an assortment of eBook insiders, but that the public was a big fan of both Project Gutenberg and of eBooks, and were happy to send us error reports and get the new editions. This whole idea/ideal of waiting, waiting, waiting for some "perfect" edition we could release has thus caused problems in the extreme that we would never have encountered if this final edition business had never gotten started. I don't know how many of you know PG history all that well, but the first editions of all of our early works had labels like "Alice in Wonderland 0.1" to "Alice in Wonderland 0.9" before they were ever "officially" released, simply because we all KNEW there would be errors to correct. I never believed in trying to START with "perfect" eBooks-- I just figured they would perfect themselves in growing up, through the natural process of our reader sending errors. Now we want to pretend there ARE no errors, even to points where our bigmouth says "perfect" in referring to this. This pretense is causing us HUGE problems and denying book access to thousands of titles we could release as "0.x." By the way, as for the count being 2,000 in excess, do not forget the 2008, or so, currently in "PrePrints." Counting those it is a little over 8,000.
Its the proprietary software producers that let you wait forever and then release crap.
Now, which of these is DP?
I would like to see PG & DP be a little less proprietary, a little less about who gets how much credit and a little more about getting the books out there ASAP and then work them up to a never ending Xeno's progress to perfection. Please. . . . I hope to be thanking you for your consideration, I'd like to get our CEO, Greg Newby, started testing this out in the near future, and we'll see how it works. Until we actually try it, all this is just conjecture.... Michael S. Hart Founder Project Gutenberg

I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
If that's at least an option, why not? Nobody forces you to buy it, though.
Again, I as an unpaid volunteer don't appreciate having my time and effort converted into a for-profit enterprise before my public domain efforts have reached fruition through DP. The end result is that I get turned off of DP and go "solo" instead. When I go "solo" I admittedly create works that are *somewhat* more buggy than DP claims to make. The difference it that my efforts see the light of day this month rather than three and a half years from now. When my NFP volunteer efforts are used poorly then I find somewhere else to volunteer my time and efforts. Why should DP care? Well, which "DP" are we talking about? The DP made up of volunteers who get frustrated by the inefficiencies and leave? Or the DP made up of lifers who don't want to see change?

Again, I as an unpaid volunteer don't appreciate having my time and effort converted into a for-profit enterprise before my public domain efforts have reached fruition through DP. The end result is that I get turned off of DP and go "solo" instead. When I go "solo" I admittedly create works that are *SOMEWHAT* more buggy than DP claims to make. The difference it that my efforts see the light of day this month rather
On Mon, 1 Mar 2010 21:41:01 -0800, "James Adcock" <jimad@msn.com> wrote: than
three and a half years from now. When my NFP volunteer efforts are used poorly then I find somewhere else to volunteer my time and efforts. Why should DP care? Well, which "DP" are we talking about? The DP made up of volunteers who get frustrated by the inefficiencies and leave? Or the DP made up of lifers who don't want to see change?
The end result will still be in the public domain and can be scooped up by any entity, commercial or non-commercial. I don't really see the point you are trying to make. Regards, Walter

Why
should DP care? Well, which "DP" are we talking about? The DP made up of volunteers who get frustrated by the inefficiencies and leave? Or the DP made up of lifers who don't want to see change?
The above two categories form a very small percentage of D.P volunteers. The vast majority (who are silent) are continuing to work in D.P. They are aware of the problems and hope that solutions would be found shortly. They are convinced that the D.P Board would implement changes for effecting a better flow of the books. -- Sankar Service to Humanity is Service to God

The end result will still be in the public domain and can be scooped up by any entity, commercial or non-commercial. I don't really see the point you are trying to make.
The "end result" to date is that a commercial company has taken my not-for-profit work off DP at SR time and redistributed it under DRM such that it cannot to date be "scooped up" by any other entity, commercial or non-commercial. The "end result" to date is that the donation of my time and effort to a non-profit activity has been privatized for other's profit without any contribution to the non-profit community. This is typically called "conversion" and is typically considered at least morally to be theft of non-profit contributions. If I wanted to work for profit I would do so in the first place -- and would do so for my own profit rather that of bottom feeders who prey on DP. Again, if "DP" [whoever that is] doesn't care about these issues, *I DO*, and so I will put my volunteer efforts elsewhere -- where my volunteer efforts WILL go in fact into NFP, and where my volunteer efforts WILL make a positive impact on the world in a finite amount of time.

On Tue, Mar 2, 2010 at 11:37 AM, Jim Adcock <jimad@msn.com> wrote:
The "end result" to date is that a commercial company has taken my not-for-profit work off DP at SR time and redistributed it under DRM such that it cannot to date be "scooped up" by any other entity, commercial or non-commercial. The "end result" to date is that the donation of my time and effort to a non-profit activity has been privatized for other's profit without any contribution to the non-profit community. This is typically called "conversion" and is typically considered at least morally to be theft of non-profit contributions. If I wanted to work for profit I would do so in the first place -- and would do so for my own profit rather that of bottom feeders who prey on DP. Again, if "DP" [whoever that is] doesn't care about these issues, *I DO*, and so I will put my volunteer efforts elsewhere -- where my volunteer efforts WILL go in fact into NFP, and where my volunteer efforts WILL make a positive impact on the world in a finite amount of time.
I'm not sure if you understand what "Public Domain" means. It is not not-for-profit... it means there is _no_ restriction on further use of the text. Someone can reprint it, use it for derivative works, fold, spindle, mutilate, write slash, whatever, at any point[0]. There is no copyright restriction attached, and *no legal way to prevent redistribution*[1]. It also works the other way... the independent commercial entity that republished the text on Amazon has no way to prevent us from putting the final, polished text up *for free* at PG once it finishes PP/PPV. Also, it can indeed be "scooped up" by anyone else who wishes to at DP before that point. DP, the organization, is a not-for-profit. The material that the organization works upon are Public Domain in the US. R C [0] Technically there is an automatic copyright on the annotations that the proofers insert... they'd have to strip the [**] notes. [1] Trademarks can turn up in specific cases, but that's another issue entirely.

I'm not sure if you understand what "Public Domain" means.
I certainly understand what it means. I volunteer my not-for-profit efforts to make public domain works. Those works IN PRACTICE enter the public domain when PG makes them available to the public, not before then. When books get stuck on DP queues "forever" then for-profits pick them up from SR and distribute them under DRM at which point in time the book still IN PRACTICE fails to enter the public domain. This makes me unhappy, not principally because a for-profit has picked up the book but rather because DP continues to fail to recognize that their current queuing system and work rules are busted, such that effectively one third of the effort contributed to DP never in practice reaches the public domain, which in turn wastes my time and effort when I volunteer there -- not to mention more importantly the time and effort of 1000's of others who volunteer there. But, instead of recognizing that the current system is busted and that people there need to fix it what happens instead is that DP'ers insult the intelligence of people who try to point out to them that the current system is in fact busted. Again, under the current DP system for every three books started two books get released. This means that about 1/3 of the DP volunteers efforts are effectively being wasted.

On Tue, Mar 2, 2010 at 7:43 AM, Jim Adcock <jimad@msn.com> wrote:
But, instead of recognizing that the current system is busted and that people there need to fix it what happens instead is that DP'ers insult the intelligence of people who try to point out to them that the current system is in fact busted.
Jim, we've known that it's busted for quite some time. You don't need to scream at us and tell us we're idiots and fools if we don't do what YOU order us to do, immediately. The negative reaction you're getting is to your tone and tactics, not your news flash. The problem is knowing just how the fix the beast while it's careering along -- like fixing your car while it's in motion. Because I'm not a programmer, I can't contribute to the solution, but I have high hopes that someone will code a system that can be shown (by experiment, in practice) to work better. Once there's a working prototype, you'll see movement. -- Karen Lofstrom aka Zora

The negative reaction you're getting is to your tone and tactics, not your news flash.
Sorry, but *my* negative reactions are based on DP people who say: a) That there is no problem having books stuck on queues for an average of 3.5 years now. And/or b) Offer "solutions" which will not in fact reduce the size of the queues and how long books sit there. Again: a) There IS a problem with having books stuck on queues, including that fact that 1/3 of the volunteers' time and energy is being wasted currently. b) Any proposed "solution" has to in fact act to reduce the size of the queues and how long books sit there. And it needs to do so without chasing away any class of volunteers including P1s -- since P1s represent the future of DP. One simple suggestion to start with would be to start by changing the stated "Goals" for P3 and F2 and PP to be larger than the Goals for P2 and F1. To do otherwise is to have DP suggesting that they want the queues to be even longer than they are now. Right now the stated goals for P2 and F1 are larger than the stated goals for P3 and PP -- which will only make the queuing situation worse. The fact that the "Goals" are inverted would seem to imply that the powers that be do not understand the nature of the problem -- in which case how can they fix it?

"James Adcock" <jimad@msn.com> writes:
a) That there is no problem having books stuck on queues for an average of 3.5 years now.
It's a storage "problem"--nothing more, nothing less. There are books waiting in the google cache for more than x years. Not to mention all the libraries... The problem is you and me, who don't want to understand that is impossible to read all the books in livetime. -- Karl Eichwalder

The problem is you and me, who don't want to understand that is impossible to read all the books in livetime.
By the same argument volunteers should stop working on DP because there are more books at PG than can be read in a lifetime... ...In fact there are more books stuck on the queues at DP than can be read in a lifetime....

A decade or so ago I pulled the whole PG repository via ftp. I have not gotten through it. What a waste of my time??? On the other side, lets just shut everthing down as most of the consumer computers have some way of displaying scans. So we are just wasting everybodies time? regards Keith. Am 03.03.2010 um 00:27 schrieb Karl Eichwalder:
"James Adcock" <jimad@msn.com> writes:
a) That there is no problem having books stuck on queues for an average of 3.5 years now.
It's a storage "problem"--nothing more, nothing less. There are books waiting in the google cache for more than x years. Not to mention all the libraries...
The problem is you and me, who don't want to understand that is impossible to read all the books in livetime.

On the other side, lets just shut everthing down as most of the consumer computers have some way of displaying scans.
So we are just wasting everybodies time?
Yes and no. The Google "photocopies" of books available at books.google.com aka their PDF downloads which are just page images ARE useful, I can even read many of them successfully on my Kindle DX. There is even some charm in reading books in their original layout -- and some charm in seeing the occasional scanner's thumb. Reading pages and pages that have been scribbled on by 200 years of students is not very charming, IMHO. And the Google page images have the blotchy blurry heavy-font characteristics of bad photocopies. Even some of Google's EPUB files, which are just OCRs of these same books with all the scannos intact, can sometimes be an interesting read. The question is, in my mind, is Google preserving the books, and doing so for the public good or not? I suspect when Google digitizes the book the original is then trashed by the college library -- the whole point being they do not want to have to pay to maintain physical library books in various states of decay. Google then becomes the sole repository for this information -- excepting a smallish number of copies at TIA. Further, is Google dedicated to trying to keep this work public, or on the contrary is Google hoping for changes in the copyright law so that they can fully privatize these digitizations? Compare to what happens when volunteers at DP or PG correct a text and publish it in electronic form. Publically available? Yes. Available from a huge variety of redundant sources? Yes. Suitable to be republished easily on paper by either NFPs or For-Profit publishers? Yes. Reflowable so that it can be read comfortably on a wide variety of devices by people with differently aged eyes including by people with little or no vision? Yes. Yes. Yes. Etc. However, The DP/PG approach is extremely expensive compared to what Google is doing. Consider: Google Books == about 10 million books photo scanned. DP/PG == 30,000 books "fully restored." So Google's approach is about 300X faster than the DP/PG approach. My Conclusion: In the best of all world's there would be some measure of VALUE in choosing which books DP/PG chooses to put effort into fully restoring -- the idea that somehow DP/PG is going to be able to fully restore all the world's books is surely false. When someone at DP chooses to introduce a book that is expensive to do and the end result has relatively little value to society, that means other more important books will not be restored. It is not simply a question of "First Come First Serve" because on DP a worthy book can easily become stuck on the queues behind a less worthy book, such that the more worthy book is not allowed to be worked on by anybody. How does one measure "worthy vs. non-worthy?" Not a trivial matter, I admit. But to my mind one measure is obvious: Books that real people do not in practice want to read we should not bother to restore! I don't care if it's a book on ancient Sanskrit. If 1000 people want to read it, it's worth doing. If only 6 people want to read it, it's not worth doing. As a simple measure at least the total amount of time people spend reading the book has to exceed the amount of time volunteers spend preparing the book, or it's a loss to society. Again, the most popular books on PG are read 100,000 times more often than the least popular books. Now it's hard to find one of these most popular books to tackle today. But it is trivial to find a book to work on that will be 50X more popular than the average book DP finishes. Let Google deal with the unpopular books, and let DP/PG work on books that people actually *want* to read.

On 2010-03-03, at 19:24, Jim Adcock wrote:
However, The DP/PG approach is extremely expensive compared to what Google is doing. Consider: Google Books == about 10 million books photo scanned. DP/PG == 30,000 books "fully restored." So Google's approach is about 300X faster than the DP/PG approach. My Conclusion: In the best of all world's there would be some measure of VALUE in choosing which books DP/PG chooses to put effort into fully restoring -- the idea that somehow DP/PG is going to be able to fully restore all the world's books is surely false.
I think that the bet made by Google, is that sooner or later, sufficiently smart AI and OCR technology will be developed to allow to process its scans and do the job of PG automatically. The only question is when it will happen, and some think that singularity will occur within 20 years. But this is probably not a reason to stop working on PG! :-) -- __Pascal Bourguignon__ http://www.informatimago.com/

I think that the bet made by Google, is that sooner or later, Sufficiently smart AI and OCR technology will be developed to allow to process its scans and do the job of PG automatically.
I would think that anyone who has worked on OCR, or automated grammars, or AI, or in making books for PG can tell you they would lose that bet! (Not that a lot can't be done to get rid of 90% of the errors "automagically!")

Hi All, I step in here and reply to a couple posts at one time. Am 03.03.2010 um 20:42 schrieb Pascal J. Bourguignon:
On 2010-03-03, at 19:24, Jim Adcock wrote:
However, The DP/PG approach is extremely expensive compared to what Google is doing. Consider: Google Books == about 10 million books photo scanned. DP/PG == 30,000 books "fully restored." So Google's approach is about 300X faster than the DP/PG approach. My Conclusion: In the best of all world's there would be some measure of VALUE in choosing which books DP/PG chooses to put effort into fully restoring -- the idea that somehow DP/PG is going to be able to fully restore all the world's books is surely false.
Google produces scan sets. Sure they put some in amore pleasurable form, But they are not interested producing books or even conserving them. The quality of the work is proof in fact. My personal opinion is that Google is simply interested in producing revenue, by what even means! That does not mean that Google does not have any merit. DP want to produce pleasurable eBooks. Personally, DP/PG has more value.
I think that the bet made by Google, is that sooner or later, sufficiently smart AI and OCR technology will be developed to allow to process its scans and do the job of PG automatically. I doubt this very much. AI proper has been dead since the failing of the ELISA project. Yes, the term is still used today to refer to anything that a computer does that seems to be intelligent. But it is hardly AI.
In the 80s machine translation was all the fad. The japanese said they would have a MT system that would translate in real time your telephone conversations by the 90s. Well, here we are some 20 years later and we can have the most horrific translation made online. The standard is that of my introductory class I had in the 80s. Googles service does not even use the half of the developments made MT.
The only question is when it will happen, and some think that singularity will occur within 20 years.
BB if it was realistic I would take you up on your bet. In 50 years their will not be a system finished that will do job of creating proper output anything above 95% fully automatically That is without any human interaction whatsoever.. Already, in the 90s it was said that with faster computers and cheaper storage the problems of knowledge engineering. Again, here we are and all is vaporware. It has been proven already in the 80s that human language Type 0 is and it is know that Type 0 can not be processed completely automatically by a computer. So the emphasis has change to simulate as much as possible. Yet, this will be always far from prefect. Sorry, for being here more than OT. But it was needed to prove the fact that anything having to do with language can not be handle by a computer program by itself. regards Keith.

BB if it was realistic I would take you up on your bet. In 50 years their will not be a system finished that will do job of creating proper output anything above 95% fully automatically That is without any human interaction whatsoever.. _I_ will take that bet!!! Even thought there are no realistic odds I will be here to collect. I will be only too glad to have the proceeds go to PG, or In Memoriam. The bet is that a Xerox machine type of scanning and OCR will produce a 95% accurate copy of certain pages selected from an average set of books, magazines, etc. Just go to a library and ask for samples. Fair enough??? Michael

Michael S. Hart wrote:
BB if it was realistic I would take you up on your bet. In 50 years their will not be a system finished that will do job of creating proper output anything above 95% fully automatically That is without any human interaction whatsoever..
_I_ will take that bet!!!
Even thought there are no realistic odds I will be here to collect.
I will be only too glad to have the proceeds go to PG, or In Memoriam.
The bet is that a Xerox machine type of scanning and OCR will produce a 95% accurate copy of certain pages selected from an average set of books, magazines, etc. Just go to a library and ask for samples.
Accuracy of OCR already exceeds 99%. Send me the money. -- Marcello Perathoner webmaster@gutenberg.org

On 3/4/2010 11:50 AM, Marcello Perathoner wrote:
Michael S. Hart wrote:
[snip]
The bet is that a Xerox machine type of scanning and OCR will produce a 95% accurate copy of certain pages selected from an average set of books, magazines, etc. Just go to a library and ask for samples.
Accuracy of OCR already exceeds 99%.
Absolutely. According to what I learned in typing class (yes, I really am that old) a standard typewritten sheet of paper averages 72 lines of 66 characters each, resulting in 4752 characters per page. Based solely on a per character basis 99% accuracy would allow 47 errors per page. Modern OCR, even that POS that IA uses, gives better accuracy than that. If you choose to look at words instead of characters, it is generally accepted that the average word length is 6 characters, for an average of 9.5 words per line (I have omitted spaces which is why it is not 11 words per line). This results in an average of 679 words per page, which at 99% accuracy would allow for 6 misrecognized /words/ per page. That is still well within the recognition accuracy of modern OCR. Personally, I find bowerbird's stated goal of 1 error per 10 pages a worthwhile goal. This is actually an accuracy rate (based upon words) of 99.9998527%. So maybe the bet ought to be when automated OCR will exceed four 9s of accuracy (basically one word error per page). Some of the recent work I have done, from my own scans, already reaches that threshold. (Accuracy will, of course, vary depending on the quality of the scanned image. YMMV and all that jazz.)

Marcello don't you ever READ anything before replying???!!! Still???!!! "In 50 years there will NOT be as system. . .above 95%. . ." I took that bet, betting there WILL be. . .SHOW ME THE MONEY!!! How do you expect anyone to EVER take you seriously when you do this kind of thing over, and over, and over. . .???!!! On Thu, 4 Mar 2010, Marcello Perathoner wrote:
Michael S. Hart wrote:
BB if it was realistic I would take you up on your bet. In 50 years their will not be a system finished that will do job of creating proper output anything above 95% fully automatically That is without any human interaction whatsoever..
_I_ will take that bet!!!
Even thought there are no realistic odds I will be here to collect.
I will be only too glad to have the proceeds go to PG, or In Memoriam.
The bet is that a Xerox machine type of scanning and OCR will produce a 95% accurate copy of certain pages selected from an average set of books, magazines, etc. Just go to a library and ask for samples.
Accuracy of OCR already exceeds 99%.
Send me the money.

One big problem, You dio not stil a a PG or DP text ebook. You do have any markup what so even! Plus, what happens if you give them the Google scan sets!! I have work with OCR that will get me 100% text accuracy, but it took a hell alot of training, aka human interaction. Also, OCR today achieves their accuracy from dictionaries and guessing at the correct spelling. Which under many circumstances this type of heuristics causes a quite a few errors. regards Keith. Am 04.03.2010 um 16:02 schrieb Michael S. Hart:
BB if it was realistic I would take you up on your bet. In 50 years their will not be a system finished that will do job of creating proper output anything above 95% fully automatically That is without any human interaction whatsoever..
_I_ will take that bet!!!
Even thought there are no realistic odds I will be here to collect.
I will be only too glad to have the proceeds go to PG, or In Memoriam.
The bet is that a Xerox machine type of scanning and OCR will produce a 95% accurate copy of certain pages selected from an average set of books, magazines, etc. Just go to a library and ask for samples.
Fair enough???
Michael
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Wed, Mar 3, 2010 at 1:24 PM, Jim Adcock <jimad@msn.com> wrote:
I suspect when Google digitizes the book the original is then trashed by the college library
That would be silly. When you have endowments the size of the Harvard University, you have no need to do that; since they're already built book-storage buildings where books can stored in more space-efficient forms than browseable stacks and are retrievable only with a couple days' notice, you're simply more free to exile them there.
-- the whole point being they do not want to have to pay to maintain physical library books in various states of decay.
The whole point of this thing is that Google thought that digitalizing this material would be valuable, and the universities all thought that it would be valuable to have digital copies of their collection, and that it would further their mission to spread knowledge.
Google then becomes the sole repository for this information
No. The universities all have copies of all the scans made from their books.
But to my mind one measure is obvious: Books that real people do not in practice want to read we should not bother to restore!
Then we aren't doing enough porn. If your sole measure of worthiness is the number of hits, then forget about doing the works of Sarah Orne Jewett, let's start digging up all that erotica published in the 20s and 30s under the table and watch the Google hits come flying it.
As a simple measure at least the total amount of time people spend reading the book has to exceed the amount of time volunteers spend preparing the book, or it's a loss to society.
It's not a loss to society to take time that would be used for watching TV and use it to restore books. It's not a loss to society if we make a work accessible to the right scholar, or if we inspire the right person.
But it is trivial to find a book to work on that will be 50X more popular than the average book DP finishes.
First, looking at the puerile crap (no offense intended) that comes up as done by you, I'm not sure you can find it. The first Slashdotting of DP, someone complained that among the little material we had available was my scan of "From October to Brest-Litovsk", but to this day, I think that book--history written with lightning--was one of the more important works I did, and probably more read too (someone did it for Librivox). In some sense, the single most popular work PG has has to be the 1913 Webster's, which has been borrowed as the basis of just about every online free dictionary, and referred to by people who don't even know that PG exists. And another major point is, what do DPers actually want to work on? Hard material tends to go through slowly, where as junk fiction tends to go through pretty quickly. That has nothing to do with the popularity or worthiness of the text. We could toss out a bunch of the "less worthy" books in exchange for the OED or porn, but I doubt that will increase DP production overall. -- Kie ekzistas vivo, ekzistas espero.

Sorry, but lots of libraries are doing JUST that!!! Selling the books after digitizing. . . . I bought several volumes of the NY Herald when this was done. I will probably buy more. On Wed, 3 Mar 2010, David Starner wrote:
On Wed, Mar 3, 2010 at 1:24 PM, Jim Adcock <jimad@msn.com> wrote:
I suspect when Google digitizes the book the original is then trashed by the college library
That would be silly. When you have endowments the size of the Harvard University, you have no need to do that; since they're already built book-storage buildings where books can stored in more space-efficient forms than browseable stacks and are retrievable only with a couple days' notice, you're simply more free to exile them there.
-- the whole point being they do not want to have to pay to maintain physical library books in various states of decay.
The whole point of this thing is that Google thought that digitalizing this material would be valuable, and the universities all thought that it would be valuable to have digital copies of their collection, and that it would further their mission to spread knowledge.
Google then becomes the sole repository for this information
No. The universities all have copies of all the scans made from their books.
But to my mind one measure is obvious: Books that real people do not in practice want to read we should not bother to restore!
Then we aren't doing enough porn. If your sole measure of worthiness is the number of hits, then forget about doing the works of Sarah Orne Jewett, let's start digging up all that erotica published in the 20s and 30s under the table and watch the Google hits come flying it.
As a simple measure at least the total amount of time people spend reading the book has to exceed the amount of time volunteers spend preparing the book, or it's a loss to society.
It's not a loss to society to take time that would be used for watching TV and use it to restore books. It's not a loss to society if we make a work accessible to the right scholar, or if we inspire the right person.
But it is trivial to find a book to work on that will be 50X more popular than the average book DP finishes.
First, looking at the puerile crap (no offense intended) that comes up as done by you, I'm not sure you can find it. The first Slashdotting of DP, someone complained that among the little material we had available was my scan of "From October to Brest-Litovsk", but to this day, I think that book--history written with lightning--was one of the more important works I did, and probably more read too (someone did it for Librivox).
In some sense, the single most popular work PG has has to be the 1913 Webster's, which has been borrowed as the basis of just about every online free dictionary, and referred to by people who don't even know that PG exists.
And another major point is, what do DPers actually want to work on? Hard material tends to go through slowly, where as junk fiction tends to go through pretty quickly. That has nothing to do with the popularity or worthiness of the text. We could toss out a bunch of the "less worthy" books in exchange for the OED or porn, but I doubt that will increase DP production overall.
-- Kie ekzistas vivo, ekzistas espero. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

And another major point is, what do DPers actually want to work on? Hard material tends to go through slowly, where as junk fiction tends to go through pretty quickly. That has nothing to do with the popularity or worthiness of the text. We could toss out a bunch of the "less worthy" books in exchange for the OED or porn, but I doubt that will increase DP production overall. This is at least due to the urgency DP places on moving people out of their comfort zone. New people at every level are encouraged to choose (naturally enough) easy projects to climb the learning curve; and since virtually everyone is being encouraged to advance, this material comprises a larger portion than it would otherwise. To assist this, easy projects are released from the queues more quickly (again to encourage new skills). I've mentioned that no Shakespeare play has been released into F2 or processed into PG for several years, despite sitting in the F2 queue much of that time.

Hard material tends to go through slowly, where as junk fiction tends to go
I've mentioned that no Shakespeare play has been released into F2 or
Well, I guess I should stop complaining now because one of my DP texts has made it to PP and I was able to snag it back myself. But, I will point out its statistics on the latest round, and people can judge for themselves: This book sat on the F2 queue for 7,200 hours. It then went "live" in F2 status for 3 hours, which is how long it took 14 F2 volunteers to do all the pages. Since about 3 volunteers were working on the book at any given time, the total volunteer-hours spent on F2 was about 10. So the ratio of [time sitting on queue]/[volunteer-hours working on text] is about 700 to one. Is this a well-designed system? PS: This book WAS classified as "porn" when it first came out -- which may explain WHY the volunteers are interested in tackling it. I did tag it as containing material related to sexuality and infidelity in case anyone didn't want to work on those subjects. Nowadays the "porn" label would be a joke and the book is considered a classic of modern American literature. In defense of the DP volunteers the other book I have stuck in DP was tackled even more voraciously by DP volunteers -- and that one was never considered "porn." through pretty quickly. Material can be hard AND junk. I am perfectly happy to work on hard stuff if it will actually get used by anyone. I spent some time proofing a hard book on DP [that was labeled "Easy"] that should have been titled "How to Torture a Horse." Put up OED and I will help tackle it. I would also be happy to put up "Outline of Science Vol. II" which is hard AND popular -- if DP were willing to get it out the door in say a year or less. processed into PG for several years, despite sitting in the F2 queue much of that time. And I would also be willing to work on the bard. Again, I won't be one processing him *into* DP unless I have some assurance that he's ever going to come *out* again!

Jim Adcock wrote:
PS: This book WAS classified as "porn" when it first came out -- which may explain WHY the volunteers are interested in tackling it.
Personally I'd like to see more porn on PG, we still lack most of De Sade. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, 4 Mar 2010, Marcello Perathoner wrote:
Jim Adcock wrote:
PS: This book WAS classified as "porn" when it first came out -- which may explain WHY the volunteers are interested in tackling it.
Personally I'd like to see more porn on PG, we still lack most of De Sade.
I recall seeing something just recently, as I was looking up author names... it's in German too... Here we go: Josefine Mutzenbacher http://www.gutenberg.org/etext/31284 Looks like it came through DP. --Andrew

On 3/3/2010 1:24 PM, Jim Adcock wrote:
The question is, in my mind, is Google preserving the books, and doing so for the public good or not? I suspect when Google digitizes the book the original is then trashed by the college library -- the whole point being they do not want to have to pay to maintain physical library books in various states of decay. Google then becomes the sole repository for this information -- excepting a smallish number of copies at TIA.
This is absolutely not true. First of all, part of every agreement between a library and Google is that the library gets a copy of all the scans that Google makes. Depending on the exact contract, there may or may not be some restrictions on what the library can do with the scans, but they definitely get them. Further, the libraries do not get rid of the books. In fact, they are very protective of their books, which is why a face-up, human controlled scanning method is used (thus resulting in the occasional hand or finger in the scan). All books are returned to the libraries with as little wear as possible. For logistical reasons, both Google and the Internet Archive started with books that were in off-site repositories, but those repositories are not being removed. The librarians in charge of the scanning projects all understand that what Google is providing is a search tool, not preservation. The Internet Archive is much closer to doing archival quality work, but the libraries are still keeping the books. Remember, these librarians were burned by the promise of microfilm and microfiche as more compact storage formats for periodicals and such. A bunch of major libraries have put together a consortium called the Hathi Trust which has the explicit purpose of making sure that book scans are not lost. It provides off-site, secure storage for what the participant libraries want to put there. This includes the libraries' copies of the Google scans, as well as whatever else they decide to include. The last I was aware, the Hathi Trust did not do much, if anything, to provide public access to those scans, since that is not its purpose. I mention it here only to make folks aware that the libraries are making provision for storage even if places like Google, the Internet Archive, or, indeed, one of their own members, should disappear. I now return you to your arguments about DP. Juliet Sutherland

If you actually visit the library archives working with Google, you should be able to find out that what was promised is not an entirely true case when it comes to reality. . .at least in POV of the librarians who will speak to you freely. Of course, I will also be the first to admit that you can get a number of librarians from the same institution who will say all is perfectly well. But it's not perfect. . .not down at the lower level realities, not where the rubber meets the road. I do note that the ones who say all is well and dandy are those with political and academic aspirations, and those who tell you things are not what they should be are more street level. We have plenty of both here at the University of Illinois. ;=) On Fri, 5 Mar 2010, Juliet Sutherland wrote:
On 3/3/2010 1:24 PM, Jim Adcock wrote:
The question is, in my mind, is Google preserving the books, and doing so for the public good or not? I suspect when Google digitizes the book the original is then trashed by the college library -- the whole point being they do not want to have to pay to maintain physical library books in various states of decay. Google then becomes the sole repository for this information -- excepting a smallish number of copies at TIA.
This is absolutely not true. First of all, part of every agreement between a library and Google is that the library gets a copy of all the scans that Google makes. Depending on the exact contract, there may or may not be some restrictions on what the library can do with the scans, but they definitely get them.
Further, the libraries do not get rid of the books. In fact, they are very protective of their books, which is why a face-up, human controlled scanning method is used (thus resulting in the occasional hand or finger in the scan). All books are returned to the libraries with as little wear as possible. For logistical reasons, both Google and the Internet Archive started with books that were in off-site repositories, but those repositories are not being removed. The librarians in charge of the scanning projects all understand that what Google is providing is a search tool, not preservation. The Internet Archive is much closer to doing archival quality work, but the libraries are still keeping the books. Remember, these librarians were burned by the promise of microfilm and microfiche as more compact storage formats for periodicals and such.
A bunch of major libraries have put together a consortium called the Hathi Trust which has the explicit purpose of making sure that book scans are not lost. It provides off-site, secure storage for what the participant libraries want to put there. This includes the libraries' copies of the Google scans, as well as whatever else they decide to include. The last I was aware, the Hathi Trust did not do much, if anything, to provide public access to those scans, since that is not its purpose. I mention it here only to make folks aware that the libraries are making provision for storage even if places like Google, the Internet Archive, or, indeed, one of their own members, should disappear.
I now return you to your arguments about DP.
Juliet Sutherland _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

The way I look at is that its DP ball. Yet, If the queues are so stuck up, then DP has to shift its work force. That is get volunteers trained and motivated so that they can help clear the queues. This is simple economics. No production company can afford to produce parts for a product and not produce the end product. The only way for a company to to survive is to out-source. Which would be prerelease. Naturally, DP is not interested in making money, yet the analogy holds true, for their goals. regards Keith. Am 02.03.2010 um 23:41 schrieb James Adcock:
The negative reaction you're getting is to your tone and tactics, not your news flash.
Sorry, but *my* negative reactions are based on DP people who say:
a) That there is no problem having books stuck on queues for an average of 3.5 years now.
And/or
b) Offer "solutions" which will not in fact reduce the size of the queues and how long books sit there.
Again:
a) There IS a problem with having books stuck on queues, including that fact that 1/3 of the volunteers' time and energy is being wasted currently.
b) Any proposed "solution" has to in fact act to reduce the size of the queues and how long books sit there. And it needs to do so without chasing away any class of volunteers including P1s -- since P1s represent the future of DP.
One simple suggestion to start with would be to start by changing the stated "Goals" for P3 and F2 and PP to be larger than the Goals for P2 and F1. To do otherwise is to have DP suggesting that they want the queues to be even longer than they are now. Right now the stated goals for P2 and F1 are larger than the stated goals for P3 and PP -- which will only make the queuing situation worse. The fact that the "Goals" are inverted would seem to imply that the powers that be do not understand the nature of the problem -- in which case how can they fix it?

On Tue, Mar 2, 2010 at 12:43 PM, Jim Adcock <jimad@msn.com> wrote:
I'm not sure if you understand what "Public Domain" means. I certainly understand what it means. I volunteer my not-for-profit efforts to make public domain works. Those works IN PRACTICE enter the public domain when PG makes them available to the public, not before then. When books get stuck on DP queues "forever" then for-profits pick them up from SR and distribute them under DRM at which point in time the book still IN PRACTICE fails to enter the public domain. This makes me unhappy, not principally because a for-profit has picked up the book but rather because DP continues to fail to recognize that their current queuing system and work rules are busted, such that effectively one third of the effort contributed to DP never in practice reaches the public domain, which in turn wastes my time and effort when I volunteer there -- not to mention more importantly the time and effort of 1000's of others who volunteer there. But, instead of recognizing that the current system is busted and that people there need to fix it what happens instead is that DP'ers insult the intelligence of people who try to point out to them that the current system is in fact busted. Again, under the current DP system for every three books started two books get released. This means that about 1/3 of the DP volunteers efforts are effectively being wasted.
Copyright works have to be in the public domain before any at DP touches it. It's still in the public domain while at DP, and it is in the public domain when it leaves DP for PG. We can try[1] to restrict access to intermediate stages by technical means, but we do NOT have any legal means to prevent redistribution short of trying something with contract law (a EULA or such).[2] You also seem to believe there is a black hole at DP where 1 out of 3 books fall into, never to emerge. This is a patent fallacy. Some books DO get shortstopped in the middle of the process (for missing pages and other issues) but it is nowhere near 1 in 3 and there is significant effort (the project hospital) to push these back into the active process. The closest thing to a black hole is PP: Available, where books can indeed sit indefinitely... but most don't. I'm not going to argue this any further with you, though. People have long been aware of the problem, and it is clear that nothing I say will influence you. R C [1] It would be a bad idea IMO, but it has been tried in the past. [2] Which would be both impractical, and against the principles of trying to get public domain works accessible, again IMO.

Robert Cicconetti wrote:
Copyright works have to be in the public domain before any at DP touches it. It's still in the public domain while at DP, and it is in the public domain when it leaves DP for PG. We can try[1] to restrict access to intermediate stages by technical means, but we do NOT have any legal means to prevent redistribution short of trying something with contract law (a EULA or such).[2]
What??? Are you saying everybody can steal everybody's else's files if they contain only PD material? If you *publish* PD material, everybody can take it and re-use it as they see fit. To publish something means to make it available to everybody. If you keep PD material on a workgroup server which is not accessible to the public at large and somebody grabs this material without your permission, then the material is *stolen* and you can prosecute them. (Provided you can prove that it was indeed your file, which should not be difficult because the scanno pattern is practically a watermark.) -- Marcello Perathoner webmaster@gutenberg.org

On Tue, Mar 2, 2010 at 2:16 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
Robert Cicconetti wrote: What???
Are you saying everybody can steal everybody's else's files if they contain only PD material?
If you *publish* PD material, everybody can take it and re-use it as they see fit. To publish something means to make it available to everybody.
If you keep PD material on a workgroup server which is not accessible to the public at large and somebody grabs this material without your permission, then the material is *stolen* and you can prosecute them. (Provided you can prove that it was indeed your file, which should not be difficult because the scanno pattern is practically a watermark.)
We're not talking about computer trespassing; the discussion is in regards to publicly available public domain material, not locked up on someone's personal computer or server. PG has procedures for establishing whether a random etext found online is public domain work, and allowing people to republish it at PG. http://www.gutenberg.org/wiki/Gutenberg:Copyright_Confirmation_How-To Random scannos do not establish a new copyrightable work, nor does sweat-of-brow. (Under current US law, etc etc.) R C

Robert Cicconetti wrote:
On Tue, Mar 2, 2010 at 2:16 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
Robert Cicconetti wrote: What???
Are you saying everybody can steal everybody's else's files if they contain only PD material?
If you *publish* PD material, everybody can take it and re-use it as they see fit. To publish something means to make it available to everybody.
If you keep PD material on a workgroup server which is not accessible to the public at large and somebody grabs this material without your permission, then the material is *stolen* and you can prosecute them. (Provided you can prove that it was indeed your file, which should not be difficult because the scanno pattern is practically a watermark.)
We're not talking about computer trespassing; the discussion is in regards to publicly available public domain material, not locked up on someone's personal computer or server.
We are talking about files that are sitting in some queue on a DP server. The DP server is not publicly accessible: It asks for a password. Taking a file out of a password-protected site and making it public without the site owner's permission is illegal. It is irrelevant if the file contains PD material or not. Try an art collector's home and explain to him that you have a *right* to enter and photograph his Monet because it happens to be in the public domain... -- Marcello Perathoner webmaster@gutenberg.org

On Tue, 2 Mar 2010, Marcello Perathoner wrote:
We are talking about files that are sitting in some queue on a DP server. The DP server is not publicly accessible: It asks for a password. Taking a file out of a password-protected site and making it public without the site owner's permission is illegal. It is irrelevant if the file contains PD material or not.
I suspect that wouldn't fly in the US. There's no restriction on getting an account, so it's likely there was no trespass. Maybe a TOS violation, but I don't think there's anything preventing this in the DP TOS, and I don't think there should be in general. Even if it does sometimes irritate me. -- Greg Weeks http://durendal.org:8080/greg/

Greg Weeks wrote:
On Tue, 2 Mar 2010, Marcello Perathoner wrote:
We are talking about files that are sitting in some queue on a DP server. The DP server is not publicly accessible: It asks for a password. Taking a file out of a password-protected site and making it public without the site owner's permission is illegal. It is irrelevant if the file contains PD material or not.
I suspect that wouldn't fly in the US. There's no restriction on getting an account, so it's likely there was no trespass. Maybe a TOS violation, but I don't think there's anything preventing this in the DP TOS, and I don't think there should be in general. Even if it does sometimes irritate me.
That would very well fly. I don't believe the DP TOS allow you to take a file out and publish it on your own. And if they allow that, I don't understand all the fuss they are making against a PG preprint distribution. Oh, and all those signs that say you can't take any pictures in US. museums, don't they fly? -- Marcello Perathoner webmaster@gutenberg.org

On Tue, 2 Mar 2010, Marcello Perathoner wrote:
Greg Weeks wrote:
On Tue, 2 Mar 2010, Marcello Perathoner wrote:
We are talking about files that are sitting in some queue on a DP server. The DP server is not publicly accessible: It asks for a password. Taking a file out of a password-protected site and making it public without the site owner's permission is illegal. It is irrelevant if the file contains PD material or not.
I suspect that wouldn't fly in the US. There's no restriction on getting an account, so it's likely there was no trespass. Maybe a TOS violation, but I don't think there's anything preventing this in the DP TOS, and I don't think there should be in general. Even if it does sometimes irritate me.
That would very well fly. I don't believe the DP TOS allow you to take a file out and publish it on your own. And if they allow that, I don't understand all the fuss they are making against a PG preprint distribution.
It's generally been admitted that they can't stop it. It's if it should be officially sanctioned or not.
Oh, and all those signs that say you can't take any pictures in US. museums, don't they fly?
Only to the extent that if they ask you to leave and if you don't comply you are trespassing. They cannot make you delete any pictures you've taken. They can't stop you from doing anything with the picture you want if the art doesn't currently have a copyright. -- Greg Weeks http://durendal.org:8080/greg/

On Tue, Mar 2, 2010 at 3:29 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
That would very well fly. I don't believe the DP TOS allow you to take a file out and publish it on your own. And if they allow that, I don't understand all the fuss they are making against a PG preprint distribution.
The difference is between something that is tolerated, and an officially sanctioned central repository. Also, I think the arguments for posting text and HTML separately got confused with the arguments about posting earlier in the process. phpBB's threading is... suboptimal. Personally, I'm in the pre-publish camp (after it passes each round, by preference. There's little point in splitting TXT and HTML posting at PP). As well as making p1->p1 opt-out, p3 opt in, parallel f1 opt-out[1], and f2 opt in. R C [1] Means a little more work for the PM to do the merge, but worth it IMO for simpler works. Would need some relatively minor tool or dev support.

On Tue, Mar 02, 2010 at 08:16:13PM +0100, Marcello Perathoner wrote:
Robert Cicconetti wrote:
Copyright works have to be in the public domain before any at DP touches it. It's still in the public domain while at DP, and it is in the public domain when it leaves DP for PG. We can try[1] to restrict access to intermediate stages by technical means, but we do NOT have any legal means to prevent redistribution short of trying something with contract law (a EULA or such).[2]
What???
Are you saying everybody can steal everybody's else's files if they contain only PD material?
If you *publish* PD material, everybody can take it and re-use it as they see fit. To publish something means to make it available to everybody.
If you keep PD material on a workgroup server which is not accessible to the public at large and somebody grabs this material without your permission, then the material is *stolen* and you can prosecute them. (Provided you can prove that it was indeed your file, which should not be difficult because the scanno pattern is practically a watermark.)
These don't seem like strongly conflicting statements. Our "no sweat of the brow how-to" gives a similar view. IF someone were to gain illicit access to files at DP or elsewhere, regardless of whether they were public domain, various legal remedies could be applied. (Quite a few, and most countries have their own set of remedies ranging from contracts, to EULAs, to things like computer fraud & abuse or misappropriation of resources.) But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain. (Bonus reading assignment: Steven Levy's "Crypto," which describes how the PGP software, which was ineligible for export from the US, found its way into other countries -- where it was perfectly legal to use.) -- Greg PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction.

And what's the message that we send when we use someone else's work (the book) that someone else scans, and someone else collects, posts, and manages (TIA) and a bunch of other people proof and/or format, and then keep that accumulated and integrated value that's been generously and freely provided for us to usel locked away exclusively for several years for one Post Processor to work on, when they get around to it? On Tue, Mar 2, 2010 at 2:16 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Tue, Mar 02, 2010 at 08:16:13PM +0100, Marcello Perathoner wrote:
Robert Cicconetti wrote:
Copyright works have to be in the public domain before any at DP touches it. It's still in the public domain while at DP, and it is in the public domain when it leaves DP for PG. We can try[1] to restrict access to intermediate stages by technical means, but we do NOT have any legal means to prevent redistribution short of trying something with contract law (a EULA or such).[2]
What???
Are you saying everybody can steal everybody's else's files if they contain only PD material?
If you *publish* PD material, everybody can take it and re-use it as they see fit. To publish something means to make it available to everybody.
If you keep PD material on a workgroup server which is not accessible to the public at large and somebody grabs this material without your permission, then the material is *stolen* and you can prosecute them. (Provided you can prove that it was indeed your file, which should not be difficult because the scanno pattern is practically a watermark.)
These don't seem like strongly conflicting statements. Our "no sweat of the brow how-to" gives a similar view.
IF someone were to gain illicit access to files at DP or elsewhere, regardless of whether they were public domain, various legal remedies could be applied. (Quite a few, and most countries have their own set of remedies ranging from contracts, to EULAs, to things like computer fraud & abuse or misappropriation of resources.)
But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain.
(Bonus reading assignment: Steven Levy's "Crypto," which describes how the PGP software, which was ineligible for export from the US, found its way into other countries -- where it was perfectly legal to use.)
-- Greg
PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Greg Newby wrote:
But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain.
But you would sue them for trespass, not for copyright infringement.
PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction.
Amazon would be an US company though. And sueing Amazon would bring some interesting facts to the public attention as to the provenience of some material they DRM. -- Marcello Perathoner webmaster@gutenberg.org

On Wed, Mar 03, 2010 at 08:04:32AM +0100, Marcello Perathoner wrote:
Greg Newby wrote:
But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain.
But you would sue them for trespass, not for copyright infringement.
Right. That was the point I was making. But finding a lawyer to take the case is tough. Getting the case before a judge is tougher. Pursuing yourself (i.e., in small claims court) is possible for people with time on their hands, but it limited in various ways.
PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction.
Amazon would be an US company though. And sueing Amazon would bring some interesting facts to the public attention as to the provenience of some material they DRM.
Amazon is an interesting and somewhat unique example (Google, Apple and Microsoft are also interesting, and unique in their own ways). You are right that PG or DP could sue Amazon. Some days, I think we should (they sell a lot of Project Gutenberg titles - with the "small print" intact, in various illegitimate ways). What we're talking about, though, is intentional tresspass on DP. I would be surprised if Amazon or the other big companies were interested in that. -- Greg

Am 03.03.2010 um 08:04 schrieb Marcello Perathoner:
Greg Newby wrote:
But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain.
But you would sue them for trespass, not for copyright infringement. So how do you prove they did it. You have to prove that they did indeed trespass. Not an easy job to do!!!
PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction.
Amazon would be an US company though. And sueing Amazon would bring some interesting facts to the public attention as to the provenience of some material they DRM. DRM is not there to protect copyright, but to protect their investment into the work they have done. Besides, in is not that hard to remove DRM, nowadays.
regards Keith.

Am 02.03.2010 um 23:16 schrieb Greg Newby:
On Tue, Mar 02, 2010 at 08:16:13PM +0100, Marcello Perathoner wrote:
These don't seem like strongly conflicting statements. Our "no sweat of the brow how-to" gives a similar view.
IF someone were to gain illicit access to files at DP or elsewhere, regardless of whether they were public domain, various legal remedies could be applied. (Quite a few, and most countries have their own set of remedies ranging from contracts, to EULAs, to things like computer fraud & abuse or misappropriation of resources.)
But as Robert mentioned, that doesn't change that the public domain content is still public domain...no matter how much value has been added through scanning, OCR, proofreading, etc. What happens if such content mysterioulsy, untraceably extracts itself from DP and becomes available elsewhere? Well, it's still public domain. As I have mention in another post. In the public domain and copyrighted are to different animals. I can put source code of a program in the public domain and still maintain a copyright. The same goes for texts.
(Bonus reading assignment: Steven Levy's "Crypto," which describes how the PGP software, which was ineligible for export from the US, found its way into other countries -- where it was perfectly legal to use.)
-- Greg
PS: Over the years, I've been involved in various efforts to bring legal remedies to online incidents. It is very hard to do, especially when there is little or no money involved. Doubly-especially if any of the actors are in different countries. Robert's emphasis on technical measures, versus more legalistic ones, is more likely to give satisfaction. Thats what DRM is. Now, how can they be applied to texts. It can only be done in the file itself. The only way to reach this is with special format for the file that can only be read our own tools, and those tools source should not be publicly available. With most readers available one can still extract the text and their by defeating its protection. This has be done with music. Effectively defeating DRM and that is why iTunes music is now DRM free.
The saying still goes, if they is a will there is a way. regards Keith.

You also seem to believe there is a black hole at DP where 1 out of 3 books fall into, never to emerge. This is a patent fallacy.
The fallacy is in assuming that the only way DP can waste volunteer efforts is to never ship some particular book. On the contrary large and increasing queue sizes can as effectively waste volunteer efforts as much as never shipping some particular book. Again, consider the Russian Roulette test: DP managers randomly shoot 1/3 of the projects at DP (prior to PP). How do these murders affect the shipping rate out of DP? Answer: They don't change the shipping rate out of DP. Conclusion: If you can destroy 1/3 of the projects at DP without affecting the productivity rate out of DP then 1/3 of the productivity at DP is being wasted. How is that productivity being wasted? By sticking it on large and increasing queues. Consider a factory that only ships 2/3rds of what it ever starts to make. Does the unfinished inventory represent value or not? Well, the factory only *realizes* value by shipping product. The shipped product has value, and eventually every piece of product gets shipped, but as long as the factory only ships 2/3rds of everything it ever makes the fact remains that the cost of manufacturing is 50% higher than it need be. IE the factory is only running at 2/3rds of its potential productivity. That unfinished inventory *might* be considered to have value, but only if new owners buy out the old owners, and change the manufacturing process such that you don't have unshipped inventory plugging up the factory anymore. Or if buyers get tired of paying 50% more for products than they should be and stop buying, then the factory has an opportunity to work off that unfinished inventory, realizing its value -- assuming they can lure back buyers at the new now lower price that doesn't include the wasted 50% markup for product started but not yet shipped. In the DP case what this analogy means is that DP gets a chance to work off the inventory if and when P1s get tired of DP wasting their time and energy and thus stop putting new work into the head of the DP queue. But DP needs P1s since they represent the future of DP. Now how can it be that a factory only ships 2/3rds of what it makes but at the same time it eventually ships every item? Consider for simplicity that the factory makes rolls of toilet paper and ships those rolls out to customers based on a "First In First Out" FIFO toilet paper roll queuing system. Does every roll of toilet paper eventually get shipped? Yes. But the problem is is that the queues are constantly getting larger, and as they do so they consume 1/3rd of the factory's resources. Consider if we changed to a "Last In First Out" queuing system. Does that change the nature of the problem? NO -- a roll of toilet paper is a roll of toilet paper. But now, based on LIFO it becomes obvious that some rolls of paper never do get shipped -- the 1/3rd of the older toilet paper rolls at any given time never get shipped -- 1/3 of all toilet paper rolls every made, and the situation keeps getting worse. But the choice of FIFO vs. LIFO queuing system in no way changes the nature of the problem -- a toilet paper roll is a toilet paper roll. Thus, on the contrary to the previously stated hypothesis, it is NOT necessary to have a "black hole" in order to waste time and effort. All that is necessary is to have a large and increasing queuing system -- whether that queuing system is LIFO or FIFO. Or stated another way, large queuing systems ARE the black hole. The mere fact that any given book eventually makes it out of the queue is not sufficient to keep the large queuing systems from being a black hole -- as long as the black hole continues to suck in more than it spits out.

The queues also seem to have the effect of promoting the release of short, easier projects at the expense of longer, more challenging ones. Consequently some of the more significant works are delayed. In June of 2005, the nine volumes of The Works of William Shakespeare - Cambridge Editionwere submitted. This was before the queues era, and the records aren't clear, but the first volume (processed as 6 separate projects, 1 play per project) were completed and became available by the end of 2006. Volumes 2 to 8 are sitting in the F2 queue, waiting to be released so they can be formatted as the last step before post-processing and eventual submission to PG. The first of them has yet to make its way completely through since the introduction of queueing. (I can't tell where Volume 9 is - it may not have been submitted yet.)

On Tue, 2 Mar 2010, Robert Cicconetti wrote:
redistribution*[1]. It also works the other way... the independent commercial entity that republished the text on Amazon has no way to prevent us from putting the final, polished text up *for free* at PG once it finishes PP/PPV. Also, it can indeed be "scooped up" by anyone else who wishes to at DP before that point.
Well no it can't. Mostly they put DRM on it, so it's a felony in the US to do anything with it. Now if someone like manybooks gets it I don't care. -- Greg Weeks http://durendal.org:8080/greg/

On Tue, Mar 2, 2010 at 3:01 PM, Greg Weeks <greg@durendal.org> wrote:
On Tue, 2 Mar 2010, Robert Cicconetti wrote:
redistribution*[1]. It also works the other way... the independent commercial entity that republished the text on Amazon has no way to prevent us from putting the final, polished text up *for free* at PG once it finishes PP/PPV. Also, it can indeed be "scooped up" by anyone else who wishes to at DP before that point.
Well no it can't. Mostly they put DRM on it, so it's a felony in the US to do anything with it. Now if someone like manybooks gets it I don't care.
"Also, it can indeed be "scooped up" by anyone else who wishes to at DP before that point." Note I said it is accessible at DP, not suggesting that one break DRM. -Bob

On Fri, 26 Feb 2010, Karl Eichwalder wrote:
Greg Weeks <greg@durendal.org> writes:
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
If that's at least an option, why not? Nobody forces you to buy it, though.
Because it's the ONLY way it's available. I should have to PAY someone else to see the work I did. It's incredibly irritating. -- Greg Weeks http://durendal.org:8080/greg/

On Thu, 25 Feb 2010, Greg Weeks wrote:
On Thu, 25 Feb 2010, Michael S. Hart wrote:
Are we worried more about who gets the credit and getting books out?
I think we're worried about the fact that the only version available is one that you have to BUY that's based on our volunteer labor.
We've always let people sell our eBooks. . .period. We just don't let them use our name. I think it's more important to get the books out, however they get out, than anything else. . . . mh

DISCLAIMER: The following _data_ comes directly from DP site statistics. All opinions following are my own. Of the "8000" works "trapped at DP": (rounded to nearest hundreds) 4100 from TIA. 700 from Gallica/BNF. 1000 from Google. 400 from the next 5 most represented online sources. (6200 in total.) Those 6200+ works already are available to the public, at minimum in scanned pages form, and most of them with OCR available. The argument that these works are "trapped" is a red herring stemming from frustration over how long it now takes the DP process to produce a "finished" version of the text. On Thu, 25 Feb 2010, Michael Hart wrote:
We have always invited people to take our completed books and redo them into their own editions, and hopefully resubmit them to redistribute.
If we do this for books that are done, why not for those undone?
PG is of course welcome to continue their status quo practice with respect to those completed texts. However, I know of nothing that entitles PG to take advantage of the efforts of the volunteers of a separate organization before the results of those efforts are freely and willingly offered to them.
Are we worried more about who gets the credit and getting books out?
Considering that the proposed scheme essentially serves to inflate PG's number of texts "available," with little significant benefit to the public, and with a real risk of significant detriment to DP as an organization and to its individual volunteers, can you honestly expect a reasonable person to take your question seriously? There are (as is often pointed out on gutvol-d) hundreds of thousands of works in the various book scanning repositories, all "undone" as you would have people believe. If PG were truly interested in making large numbers of "undone" books available in "pre-print" then perhaps they should take advantage of their organizational clout and forge partnerships to have direct access to that material in all forms. But that would require effort. What seems to have happened instead is that PG has decided that the DP in- process text, even though unfinished, is desirable low-hanging fruit, and *that* requires only minimal effort. All that's required is to convince the 'right' people at DP to either A) expend limited resources towards that end instead of where they're needed, or, B) stand aside and allow PG to take unreasonable advantage of what have so far been amiable terms of relationship. Michael, I honestly respect your vision, but your ethic is sorely lacking at the moment. David (donovan)

What these numbers and comments do NOT reflect is that even after only a portion of processing if we make these available they will be in additional formats and/or improved quality. . .not to leave out that some people may find them here rather than not at all. It's not as if this is some kind of secret process we must hide-- mh On Fri, 26 Feb 2010, D Garcia wrote:
DISCLAIMER: The following _data_ comes directly from DP site statistics. All opinions following are my own.
Of the "8000" works "trapped at DP":
(rounded to nearest hundreds) 4100 from TIA. 700 from Gallica/BNF. 1000 from Google. 400 from the next 5 most represented online sources. (6200 in total.)
Those 6200+ works already are available to the public, at minimum in scanned pages form, and most of them with OCR available. The argument that these works are "trapped" is a red herring stemming from frustration over how long it now takes the DP process to produce a "finished" version of the text.
On Thu, 25 Feb 2010, Michael Hart wrote:
We have always invited people to take our completed books and redo them into their own editions, and hopefully resubmit them to redistribute.
If we do this for books that are done, why not for those undone?
PG is of course welcome to continue their status quo practice with respect to those completed texts. However, I know of nothing that entitles PG to take advantage of the efforts of the volunteers of a separate organization before the results of those efforts are freely and willingly offered to them.
Are we worried more about who gets the credit and getting books out?
Considering that the proposed scheme essentially serves to inflate PG's number of texts "available," with little significant benefit to the public, and with a real risk of significant detriment to DP as an organization and to its individual volunteers, can you honestly expect a reasonable person to take your question seriously?
There are (as is often pointed out on gutvol-d) hundreds of thousands of works in the various book scanning repositories, all "undone" as you would have people believe. If PG were truly interested in making large numbers of "undone" books available in "pre-print" then perhaps they should take advantage of their organizational clout and forge partnerships to have direct access to that material in all forms. But that would require effort.
What seems to have happened instead is that PG has decided that the DP in- process text, even though unfinished, is desirable low-hanging fruit, and *that* requires only minimal effort. All that's required is to convince the 'right' people at DP to either A) expend limited resources towards that end instead of where they're needed, or, B) stand aside and allow PG to take unreasonable advantage of what have so far been amiable terms of relationship.
Michael, I honestly respect your vision, but your ethic is sorely lacking at the moment.
David (donovan) _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Those 6200+ works already are available to the public, at minimum in scanned pages form, and most of them with OCR available. The argument that these works are "trapped" is a red herring stemming from frustration over how long it now takes the DP process to produce a "finished" version of the text.
Sorry, but this is NOT a "red herring". Looking at DP's own statistics on this subject, the release rate is about 2/3rds the project start rate -- and has been for many years. Why does this matter -- "eventually all projects will get released?" Yes, but by the time "eventually" happens enough more new books will be stuck on queues that it will continue to be true that the release rate is about 2/3rds the project start rate. This means DP is running in a "self similar" mode where effectively 1/3 of all projects that get started DON'T get released. Which means that 1/3 of all volunteer effort is being wasted. One might say "OK, let's just slow down the project start rate." If you do that then P1s do not have interesting projects to work on and they get frustrated and go do something else with their time. But DP NEEDS to have the P1s because DP grows those -- eventually -- to be the P3s and the F2s and the PPs necessary to get the queues unstuck. But the queues can't get unstuck because increasing the start rate to attract the P1s in turn clogs the queues. So again, what is the solution? 1) Increase the number of P3s, F2s, and PPs by reducing the qualifications. Or 2) improve the tools available to P3s, F2s, and PPs to make them more productive. DP can't fix the problem without changing. If you don't understand this, please take a closer look at the plot that DP makes available at: http://www.pgdp.net/c/stats/stats_central.php where you can see that one third of projects created DO NOT get released because they are stuck on queues. As more books get released it is also true that more books get stuck on queues and the ratio remains the same: 1/3 of books DO NOT get released because they are stuck on queues. Which means that 1/3 of volunteer efforts are being wasted by a flawed process.

It's worse than that. We all know there is a large invisible queue of projects that aren't being posted at all because of the daunting prospect of possibly never seeing your project complete in your own lifetime. And we keep adding tricky new loops and spins for the benefit of one or another deserving category of workers or project types, making the ability to forecast the schedule for *your* project highly speculative.

Apologies for not chiming in as much as I'd like on this debate. There have been many excellent comments. This whole discussion is "pre-release," not just about pre-releases. I don't think we have enough information yet to decide how, whether, or for which items it would make sense to have pre-release DP items more widely available. There are various issues involved (technical, moral, conceptual, practical...). Some sort of proof-of-concept is what I'd like to work toward, to try to see what we're really talking about, and whether many (often conflicting) goals could be met. More: On Thu, Feb 25, 2010 at 10:17:19AM -0800, Al Haines (shaw) wrote:
Some of these questions may overlap a bit--bear with me...
Who's going to monitor this pre-prelease page for when projects on it are posted? The WWers? DP? Greg?
Needs to be cron (the Linux/Unix automated task scheduler). If it's not automated, it's not going to achieve anyone's goals. I realize this applies to *removal* not just harvesting.
Will it contain only pre-release text files, or all working files associated with a given project (page scans, illustrations, text, HTML, etc, etc)?
Unknown, which is why I'd like to see an experiment or two before figuring out how, whether, when, etc. I can also envision some sort of "by permission" from the PM in charge of a given project. There are several phases in the DP workflow. I was (always) just thinking about the items that are stuck (i.e., delayed enough to, statistically, be thought of as stuck), but have significant value added and a reasonable level of completion. Somewhat related: page scans have been welcome as part of eBooks for years, but are seldom delivered by DP with a new eBook. (There are a few people who add them separately, later.) Maybe efforts towards getting pre-release items could also be helpful with adding page scans.
If the latter, what's to stop someone from taking those files, getting their own clearance, and submitting them to PG as their own work? Or is DP going to consider that they're, in effect, abandoned projects, and up for grabs? (I can only imagine the reaction that would cause.)
I'm not sure how likely that is, but I would discourage it and attempt to make sure further efforts on an item go back to DP & credit DP. We do reasonably well at spotting duplicate copyright clearances, and could have some README-type info about the "proper" way to take pre-release items and get them completed. My experience is that volunteers tend to honor such requests. After all, there are plenty of items to work on, and the DP front door is open to those with interests in particular items.
Related to a couple of the above questions, would the WWers be expected to check to see if a given submission is one that's also in progress at DP, or would it be a case of first-come, first-posted, and let DP take its lumps?
No, such checks need to be automated. It won't be perfect, but it's doable, well enough to raise a flag at submission time that an item might be a harvested DP item. In our harvesting how-to at www.gutenberg.org, we talk about asking permission, and about honoring requests to not add items to the collection -- even when they are clearly public domain. I don't favor allowing back-dooring of DP in-progress items by non-DP sources.
Rather than dumping who knows how many pre-releases into Preprints, I'd suggest a separate Prerelease page. (Speaking personally, I regularly check Preprints for interesting/doable projects, and have drawn a number of projects from it. I doubt I'd be interested in looking through a raft of ex-DP items, searching for non-DP Preprint items.)
You're solo efforts are amazing and appreciated, but unusual. I don't think the fact that a number (even thousands) of items appear in reprints will result in a lot of separate solo submissions. As others have mentioned, there are any number of sources of items available that motivated individuals could select from. I think that most such people will honor requests to keep prerelease items with DP (including, of course, a link to sign up and get to work on the DP workflow). -- Greg

Since I seem to be conveying messages, Someone With Authority has finally spoken, at DP, and the word is pretty clear. This is from Louise Davies, the General Manager: There are many arguments for and against the idea of making not-quite-finished texts available sooner than they would be otherwise. I will not list all the pros and cons here, as it would only be repeating most of the points already made. Here are my thoughts on it. 1. Once the preprints have been posted, anyone--even non-DP members--can pick those up, apply for a copyright clearance and post them themselves. Those clearances are not, and never have been, a reservation for the clearance holder. To the best of my knowledge, PG has not offered to make them exclusive, either. Whoever posts first, wins. We have deleted many a project because someone else has beaten us to the posting. That is one of the worst ways that our resources can be wasted and our morale shot down. I deleted a three-volume set of projects last week, because DP-EU posted two of them to PG first and the third one is in R2. (And yes, the two persons listed in the DP-EU credits are also members here, so I did not see the point in comparing quality. I only verified that they were the same edition, and they were even from the same source as well.) It cuts me deeply every time I have to do that. So, do we really want to put our texts out there in preprint-land, which would encourage more of this? 2. Transferring text files en masse to an off-site preprints area would be counter to the current site policy. Policy changes are the dominion of the DP Board. These discussions, both here and on gutvol-d, have been brought to their attention. So unless we hear differently, I would say it is business as usual. (And please, please, please, do not construe the opinion of an individual who also happens to be a Board member as being that of the entire Board. Unless there is an official stamped and notarized announcement, it remains an individual opinion.) 3. While it is possible for anyone with a little ingenuity to harvest our text files in bulk, this would also not be a good thing to encourage. It would add a certain (though possibly negligible) load on our servers. It would upset a good number of PMs and PPers ('violated' is the word that comes to mind). And, it would have the potential of compromising our members' privacy which we maintain so carefully through the application of our Privacy Policy <http://www.pgdp.net/c/faq/privacy.php>. 4. If a PM decides to make their text files available in the preprints area, and later--while a PPer is polishing it up--someone else grabs the preprint and posts it to PG, I think the PPer would be a might upset. At the very least the PM might place a warning for the PPer that it has been uploaded to preprints and it could possibly be posted by someone else.

On Wed, 24 Feb 2010, Al Haines (shaw) wrote:
Hmm... I figured Greg was busy enough! <g>
As for the "major technical challenge" suggested elsewhere in this topic, why can't DP put up a wiki page for pre-releases, similar to its Harvesting wiki page? Project Managers (or whoever) could put links on the page to their pre-release candidates, and when a pre-release was ready for submission to PG, or had been posted, the PM could remove the link.
I'm not going to hand edit a wiki for every project. I suspect no one else will either and the wiki would end up totally out of date. The PG bookshelf wiki pages are basically in that state now. -- Greg Weeks http://durendal.org:8080/greg/
participants (21)
-
Al Haines (shaw)
-
Andrew Sly
-
cmiske@ashzfall.com
-
D Garcia
-
David Starner
-
don kretz
-
Greg Newby
-
Greg Weeks
-
James Adcock
-
Jim Adcock
-
Juliet Sutherland
-
Karen Lofstrom
-
Karl Eichwalder
-
Keith J. Schultz
-
Lee Passey
-
Marcello Perathoner
-
Michael S. Hart
-
Pascal J. Bourguignon
-
Robert Cicconetti
-
Sankar Viswanathan
-
Walter van Holst