Re: [gutvol-d] Crowdsourcing

how do you prevent noise from overwhelming signal? users won't wade through 20 different versions, hoping one of them _might_ be better than the "official" version. even if they find one, the word-of-mouth will be diluted. heck, your own re-done versions find that they suffer a significant disadvantage competing with older versions. and they're at least allowed to engage on the same field. another place is gonna quickly be dismissed as a ghetto. even if the quality is there, the critical mass will not be... roger built a better proofreading machine than d.p., but couldn't attract people to staff it... what reason is there to believe there's anyone to run _multiple_ new systems? i relish the idea of a tournament among new workflows, but if a winner never gets declared, what is the purpose? that doesn't have to mean losing systems got discarded... but it would mean the best one will get some recognition. without some kind of decision on quality, you will just be burning volunteer time and energy; you've wasted enough. i'm all for live-and-let-live. but there's only so much oxygen in the room. count me in... but i'll serve my pudding on my own site... -bowerbird

On Fri, Jan 27, 2012 at 12:02:47PM +0100, Bastien wrote:
Bowerbird@aol.com writes:
how do you prevent noise from overwhelming signal?
Self-discipline.
I was thinking of community ratings. Internet Archive has a "batting average." Facebook et al. have "like" or "+1". Slashdot et al. have ratings for postings, and for submitters. PG has "top 100" for titles, which as we know is not always satisfying. We will need some automated procedures (and perhaps even a person in the loop, as editor) to make sure something that is really broken doesn't get into the mix. But as long as that happens, the rest is a matter of personal preference. Naturally, there can be a few different ways people will arrive at a particular file for a particular book. As long as it's easy to get to one that has been lovingly prepared, I don't think it will matter much to most readers which specific version/edition/format/etc. they get. Note that, as always, an important aspect will be the ability to automatically make derived formats. I think social and technical pressure will make it desirable to choose a capable master format. For example, if someone lovingly hand-crafts a MOBI, it will be rapidly deprecated if fixes are applied to the underlying text or images, but those fixes are not reflected in the MOBI. Conversly, if they lovingly craft an RST that renders wonderfully in all derived formats, then corrections to the RST will automatically propagate. -- Greg

Conversly, if they lovingly craft an RST that renders wonderfully in all derived formats, then corrections to the RST will automatically propagate.
Fundamentally, if RST (picked on only as the example at hand) renders wonderfully in all derived formats, it is because RST has restricted the set of operations it supports to that subset of all operations found in all target machines. Which is why restricted formats end up looking so bland. One can find blandified versions of PG works for free from many many secondary vendors including Amazon. It always saddens me to see these inferior versions being propagated vs. the richer and more accurate versions which were originally submitted by volunteers to PG. One reason that HTML is such a pain for small machines is that many HTML submitters *want* to use the richer set of features only suitable to larger machines.

On Fri, January 27, 2012 3:05 am, Bowerbird@aol.com wrote:
how do you prevent noise from overwhelming signal?
Rules. We nerds tend to view the world in binary: one or zero, good or bad, my way or the highway, this extreme or that extreme, the Cathedral or the Bazaar. Reality tends to be much more nuanced. My experience and observations have convinced me that every successful open source project has been lead by a strong leader: Eric Young with OpenSSL, Julian Smart with wxWidgets, Andrew Tridgell with Samba and rsync, Jean-Loup Gailly and Mark Adler with zlib, and of course Linus Torvalds with Linux. While Torvalds has personally written only about 2% of the Linux kernel, he remains the ultimate authority on what new code is incorporated into the standard Linux kernel. I believe that this level of involvement and control by the founding visionary is the principal reason that Linux is solid, stable and hasn't simply dissipated. My single greatest criticism of Project Gutenberg is, and always has been, its chaotic and anarchistic nature. I don't believe people can do good work in a totally free-form environment; in fact I don't believe anything can be accomplished at all. As a result, rules have evolved at PG, but the public assertion that PG is totally unfettered has cause most of these rules to remain covert. Mr. Haines has developed his set of rules, Mr. Perathoner has developed his set of rules, and Mr. Hart had developed his own set of rules. Sometimes these rules were inconsistent, just as the published rules can be inconsistent. As Mr. Adcock has pointed out, trying to get a document into the PG database while complying with these unwritten rules can be maddening. I would not be surprised if the primary reason that DP has become virtually the only source for PG documents is because volunteers know that by working through DP they at least have a framework to work in. The rules may foster productivity or they may hinder it, depending on your viewpoint, but at least everyone knows what the rules are. As a software developer, my greatest challenge is not in developing programs, but in getting my customers to tell me exactly what problem needs to be solved, and what constraints there are on the solution (I'm not interested in /how/ they think the problem should be solved; that's /my/ job). So, Mr. Newby, I don't think that lack of good tools is an issue; what is missing is lack of vision. Here's my proposal, in broad terms: 1. Project Gutenberg will provide support for a new project; let's call it Project WOPR (Hello, Joshua ... ). There will be no restrictions from PG on the scope or methods of the project (except, of course, those necessary to comply with United States' law). 2. Joshua Hutchinson will be designated benevolent dictator of the project. He need not make more that 2% of the rules, or provide more than 2% of the effort, but he gets to ultimately decide which rules are acceptable, and whether any particular individual's efforts satisfy the rules. The rest of us can whine, moan, complain, cajole or boycott, but only Mr. Hutchinson decides. He can't tell me what I must do, but he may tell me what he will accept. BUT... 3. All the rules will be published, and no volunteer effort will be rejected based on failure to comply with an unpublished rule. I have lots of ideas about what the rules should be, some of which may be unacceptable. But without a commitment to "the rule of law," and an established mechanism for rules to be created, my ideas are just dust in the wind.

I would not be surprised if the primary reason that DP has become virtually the only source for PG documents is because volunteers know that by working through DP they at least have a framework to work in.
Well, I would think another reason would be that DP tries to foster positive feedback, whereas PG feedback is generally negative. Now mind you the DP feedback may be mainly in the "girl scout badge" category, but at least its positive feedback .... DP *has* built a strong community of like-minded people, which is usually, but not always, a positive.
participants (5)
-
Bastien
-
Bowerbird@aol.com
-
Greg Newby
-
James Adcock
-
Lee Passey