any arguments against "free-range" proofing?

the d.p. proofing system locks each page to a single proofer. (there's one and only one p1 proofer, p2 proofer, and so on.) so does rfrank's roundless system; once a page has been assigned to a proofer, it's semi-difficult to even look at it. and if someone else has reproofed it _after_ that person, then the old version is stored somewhere i can't figure out, so tracking the diffs simply cannot be done by an outsider. (the d.p. system at least allows you to do that tracking, and even has a routine that will show you round-to-round diffs.) it is by analyzing these round-to-round diffs very closely that you can get a sense for how a page progresses from the initial o.c.r. to its final -- hopefully perfect -- stage... *** the question i have today is whether there is a good reason why a page needs to be assigned-and-locked to one person. is there any reason why you shouldn't allow any proofer to go and proof any page in a book? yes, it would mean that some pages might be proofed several times, but so what? that's not necessarily a _bad_ thing, is it? i'm writing code now to build my own proofing system, and i'm curious about this particular aspect. i think it would be important to inform a proofer how many previous people have proofed each specific page, so as to let that proofer choose whether to do an additional proof, but if they _want_ to do it, is there any reason to disallow it? *** partly this ties into _incentives_... most people like _finding_and_fixing_ errors, so there'll be a good incentive for people to work in the "first" proofing... but even in that first proofing, there are a lot of pages that are _already_ perfect, so there are no errors to find or fix... and in the second and third proofings, the number of errors that are left will be small, even collected over a whole book. so i feel it's very important to reward people for _certifying_ a page -- i.e., confirming that the page is indeed error-free. if i was to put this in terms of a "point" system, it'd be this:
5 points for fixing all of the remaining errors on a page. 4 points for doing the first "certification" of a clean page. 3 point for doing the second "certification" of a page. 2 point for doing the third "certification" of a page. 1 point for fixing _some_ (but not all) errors on a page.
if you certify a page clean, and someone later finds an error, the points turn _negative_. so make sure of your certification! if you gather enough points, you win _a_million_dollars_! ;+) *** there are a few things you need to stipulate for such a system: 1. there is one -- and only one -- "correct" way to do a page. 2. which means there are no ambiguous guidelines in place. 3. and whitespace is significant. 4. which means there are _no_ "insignificant" diffs. 5. all diffs are reviewed, and can be challenged for correctness. 6. so when a page comes out of proofing, that page is _done_. 7. which means "postprocessing" is a largely automatic thing. *** you can discuss any aspect of this post, but what i'm seeking are any arguments people can think of _against_ free-range proofing. -bowerbird

Hi BB, I do not see anything truely speaking against such a system. The only problems are the administrative tasks involved. 1) you have to track all this. 2) keep everything store somewhere 3) keep everything in sync The other question that comes to mind is you will need an authority/ies that finally certify that a page satisfies your criteria as being done. Some may call it a administrative nightmare, but it should be workable. regards Keith. Am 11.03.2010 um 00:52 schrieb Bowerbird@aol.com: [snip, snip]
it is by analyzing these round-to-round diffs very closely that you can get a sense for how a page progresses from the initial o.c.r. to its final -- hopefully perfect -- stage...
***
the question i have today is whether there is a good reason why a page needs to be assigned-and-locked to one person.
is there any reason why you shouldn't allow any proofer to go and proof any page in a book? yes, it would mean that some pages might be proofed several times, but so what? that's not necessarily a _bad_ thing, is it?
i'm writing code now to build my own proofing system, and i'm curious about this particular aspect.
i think it would be important to inform a proofer how many previous people have proofed each specific page, so as to let that proofer choose whether to do an additional proof, but if they _want_ to do it, is there any reason to disallow it?
***
partly this ties into _incentives_...
most people like _finding_and_fixing_ errors, so there'll be a good incentive for people to work in the "first" proofing...
but even in that first proofing, there are a lot of pages that are _already_ perfect, so there are no errors to find or fix...
and in the second and third proofings, the number of errors that are left will be small, even collected over a whole book.
so i feel it's very important to reward people for _certifying_ a page -- i.e., confirming that the page is indeed error-free.
if i was to put this in terms of a "point" system, it'd be this:
5 points for fixing all of the remaining errors on a page. 4 points for doing the first "certification" of a clean page. 3 point for doing the second "certification" of a page. 2 point for doing the third "certification" of a page. 1 point for fixing _some_ (but not all) errors on a page.
if you certify a page clean, and someone later finds an error, the points turn _negative_. so make sure of your certification!
if you gather enough points, you win _a_million_dollars_! ;+)
***
there are a few things you need to stipulate for such a system:
1. there is one -- and only one -- "correct" way to do a page. 2. which means there are no ambiguous guidelines in place. 3. and whitespace is significant. 4. which means there are _no_ "insignificant" diffs. 5. all diffs are reviewed, and can be challenged for correctness. 6. so when a page comes out of proofing, that page is _done_. 7. which means "postprocessing" is a largely automatic thing.
***
you can discuss any aspect of this post, but what i'm seeking are any arguments people can think of _against_ free-range proofing.
-bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

First, it depends on what you mean by "locked to a particular person." Typical of DB type stuff having two people editing the same record (in this case the same page) at the same time is typically taken to not be a good thing. Assuming you are not suggesting doing away with the typical DB convention of only having one person editing a record (the same page) at a given time, then the remaining problem is "fix thrashing" which we already see happening some in DP land. IE P1 introduces a fix, and then P2 says no *I* think it should be fixed this way and then P3 says no *I* think it should be fixed THIS way. At least in DP land P1, P2, and P3 are different people, so the "fix" may not converge but at least its not thrashing - meaning that there is only three rounds of time-wasting going on. With roundlessness you could potentially run into "proofer wars." Well, actually in DP land you can run into proofer wars too - trust me - its just that the proofers have to run to a "higher authority" to engage in fix thrashing - the DP system doesn't seem to me to directly allow proofer wars to happen.

I do have some thoughts about "free-range" proofing. The size of the corpus that is being proofed is important. The Australian Newspaper project (http://newspapers.nla.gov.au/ndp/del/home) allows a volunteer to proof any article from ~100 yrs of lots of newspapers. They built it so that their readers could improve articles when they found errors. It works very well for that purpose, but it also has several problems. One is that they don't provide any information about whether or not someone has already proofed this article. The proofing interface is totally optional, so if a reader doesn't see any errors, then they don't invoke the interface. From that point of view, it works beautifully. But they didn't make provision for someone who just wants to proof an article, any article. There is no way to say "give me another article". I find it very hard to choose at random when the number of possibilities is so large. Also, since there's no information as to whether or not anyone has looked at (proofed) this article yet, there's no way to know if one is duplicating work already done. Another problem with their system is one of completeness. For example, if they want to know whether an entire issue of a newspaper (1 day) is completely corrected (or at least that someone has edited every article) they can't do it. Part of this can be solved by them keeping track of this information. But, by the nature of their system, with efforts scattered all over the place, it is very unlikely that any one issue will be completely done. For their purposes, that doesn't matter. But when working on things that are meant to be read from beginning to end, it *does* matter. All of this ties in to a sense of progress. If the unit of proofing produces a complete entity (as with an article in a newspaper) then one can count progress by counting how many articles have been done. But if the unit of proofing is not the complete entity (as with a page of a book), then matters change. The whole idea of distributing the work of proofreading is that no one has to feel like they must do an entire book by themselves. With the current systems, a volunteer knows that even if they can't do the entire book themselves, someone else will help out and it will get done. In a free-range system, there is no such assurance that anyone else will want to help finish that book. I guess what I'm saying is that people who proof for the sake of proofing like to see progress. To have a sense of accomplishment while knowing that they contributed. The only way I can see to achieve that in a free-range environment is by limiting the number of books that are currently available. That is, concentrating the work somehow so that eventually a book is completely "done" (or, as good as it's going to get for now). I think that there is a need for both kinds of systems. The free-range system is good for material that is short. It's also good for allowing casual readers to fix something that's wrong. I don't think it works very well as a system for producing entire corrected books. Another issue with a free-range system has to do with abuse. If no one is likely to look again at whatever page I've just done, there is nothing to keep me from changing what it says. Think of it as a kind of graffiti. The Australian Newspaper project hasn't had trouble with that, but I believe that that is because they haven't been going long enough and haven't attracted a wide enough audience yet. I predict that they will have trouble with it eventually. Most people are well-meaning, but there's always the few who have to write "John was here" on a wall, or in an online book. And there will inevitably be a few fanatics who just have to substitute their view of the world, either by carefully changing a few words, or by simply putting an entire tract in place of the text that used to be there. One advantage of many people looking at a single page (or, at least 2) is that it becomes hard to get away with that kind of thing. As long as the proofing effort is relatively small, and not very high profile, a free-range system would probably not have trouble with vandalism. But if the effort were associated with a high profile organization (Google, say) it suddenly it would become much more interesting to folks who like to disrupt. In summary, I think there are three issues that a free-range proofing system must address: choice, completeness, and vandalism. I'm not saying that a free-range system wouldn't work. It obviously can. But I do think that how well it works depends on what its purpose is. JulietS On 3/10/2010 6:52 PM, Bowerbird@aol.com wrote:
the d.p. proofing system locks each page to a single proofer. (there's one and only one p1 proofer, p2 proofer, and so on.)
so does rfrank's roundless system; once a page has been assigned to a proofer, it's semi-difficult to even look at it.
and if someone else has reproofed it _after_ that person, then the old version is stored somewhere i can't figure out, so tracking the diffs simply cannot be done by an outsider.
(the d.p. system at least allows you to do that tracking, and even has a routine that will show you round-to-round diffs.)
it is by analyzing these round-to-round diffs very closely that you can get a sense for how a page progresses from the initial o.c.r. to its final -- hopefully perfect -- stage...
***
the question i have today is whether there is a good reason why a page needs to be assigned-and-locked to one person.
is there any reason why you shouldn't allow any proofer to go and proof any page in a book? yes, it would mean that some pages might be proofed several times, but so what? that's not necessarily a _bad_ thing, is it?

With the current systems, a volunteer knows that even if they can't do the entire book themselves, someone else will help out and it will get done.
I guess what I'm saying is that people who proof for the sake of proofing
Another issue with a free-range system has to do with abuse. If no one is
This statement is not true, but also to the extent it is true is also be a statement of a problem: DP has many examples of books that volunteer(s) start but which don't get finished. Hence the queuing system and the increasing wait times. However, your thesis is also a statement of a problem: When volunteers start something they assume that *someone else* needs to finish it! In turn these other volunteers may feel an obligation to finish something that someone else has started when a better answer may be to NOT finish it! Certainly in the case of very difficult and time-consuming books that no one wants to read, the right answer may be to NOT finish it. One can easily show other cases that are much more interesting: difficult books that people WOULD want to read if they were finished and yet the right answer might STILL be that it is better off NOT to finish it! [see for example: Bibliotheca Britannica] When I volunteer at DP I often end up asking myself a simple question: Do *I* think that if the person who started this project had to do it all themselves would they do so? If the answer is "NO" then I decide that my efforts are being "freeloaded" upon and I go work on something else! Conversely, one of my proposals for changes at DP is a simple one: if person A starts a book and other volunteers do not want to finish it then at least let person A finish it rather than leaving it stuck on queue "forever"! One simple measure of the "worthiness" of a project is that at least one person in the world wants to finish it. Unfortunately, DP fails even that test! - the current system doesn't even allow a person who *wants* to finish a book the right to do so! At least put in a "time out" system or something where if something gets stuck for a year or more then DP admits they are not going to get it done in a timely manner and put it back up for grabs! like to see progress. To me personally "seeing progress" means seeing something I have worked on posted to PG for others to read. Agreed that means the book needs to get "done." Each spot on a queue for a book to get stuck on is yet another chance for a book to become not-done. likely to look again at whatever page I've just done, there is nothing to keep me from changing what it says. Think of it as a kind of graffiti. I have had problems with this on Wikipedia, where one posts science-based answers to science-based questions and then people whose religion or politics conflicts with the science hack the postings. Certainly when someone is proofing something that they find offensive the temptation is always to "edit."

On Wed, Mar 17, 2010 at 8:56 PM, James Adcock <jimad@msn.com> wrote:
Certainly in the case of very difficult and time-consuming books that no one wants to read,
Unless I've missed something, you've never provided an example of such. You've certainly never shown that they exist in significant numbers at DP. -- Kie ekzistas vivo, ekzistas espero.

Unless I've missed something, you've never provided an example of such. You've certainly never shown that they exist in significant numbers at DP.
Unless I've missed something, PG doesn't publish download numbers on anything other than the most popular books. However, TIA does publish download numbers which one can use as proxy: 2,583,382 Downloads of the Most Popular PG Book 8 Downloads of the Least Popular PG Book Bang-for-the-Effort Ratio of Over 300,000 to 1. You can query this yourself using the TIA "Advanced Search" option on "collection:gutenberg" fields to return = downloads + title HTML table Sort Results by: either downloads desc or downloads acs But one should be forewarned that it does not appear to me that patterns of downloads from TIA is identical to pattern of downloads directly from PG -- TIA users are more sophisticated users aka nerdy than PG direct users. Personally I would rather work on a book that is towards the 2,500,000 download end of the spectrum than on the 10 downloads end of the spectrum! Again, there are literally about 1,000 more books out there that can be saved than we have the time and effort to save. The question then becomes, which books do we save? If one is doing the entire job oneself then the answer is easy: That book which you are willing to work on. If one is picking a book and imposing the work on other volunteers then the question becomes who should have the right to make that decision and how? "First come first serve" I suggest is a horrible way to make this choice because it encourages the most greedy and inconsiderate submitters to get there first rather than to take a thoughtful approach to picking which books to save and then doing a really really good job of digitizing and OCR'ing them.

On Thu, Mar 18, 2010 at 3:30 PM, Jim Adcock <jimad@msn.com> wrote:
Personally I would rather work on a book that is towards the 2,500,000 download end of the spectrum than on the 10 downloads end of the spectrum!
Not something I really see from what you've uploaded to PG, but okay. I'm not sure I agree though; getting something unique online or something higher-quality then can be found elsewhere, is more important to me then something there's a dozen copies of on the web.
"First come first serve" I suggest is a horrible way to make this choice because it encourages the most greedy and inconsiderate submitters to get there first rather than to take a thoughtful approach to picking which books to save and then doing a really really good job of digitizing and OCR'ing them.
I'm sure we could have told all the Slashdotters to hold on while we were preparing material for them. We might have actually done 40 or 50 books by now that way. I'm sure it also would have helped to criticize our submitters as "greedy and inconsiderate". I'm sure most people who scanned books for DP never thought about the value of the book they were scanning. -- Kie ekzistas vivo, ekzistas espero.
participants (6)
-
Bowerbird@aol.com
-
David Starner
-
James Adcock
-
Jim Adcock
-
Juliet Sutherland
-
Keith J. Schultz