
ok, rfrank has made a report over in the d.p, forums on the latest set of results from his "roundless" experiment. so let's see what he says, and what i think in reaction... *** rfrank said:
So far in testing the roundless system as it stands, I've left it to the proofer to say when they thought a page was done. Turns out, that is reliable for only a very few proofers. Those who wish to say "I told you so" can chime in now, rightfully.
ok.
the post-processing clearable errors were caused mostly by four proofers, and each of those made several different kinds of mistakes. These mistakes were found almost entirely on pages that were one-and-done, that is, proofed once. So what is to be done?
inform those proofers they are making mistakes, and how, and that they not doing nearly as well as they think they are. and to put this into perspective, there are about a dozen committed proofreaders taking part in this experiment, with another 5 dozen people contributing fewer pages... so four "bad" proofers constitutes about 1/3 of the lot... in other words, even though "4 proofers" sounds _rare_, the actuality is the percentage of "bad" proofers is high. this fact should _not_ be surprising. when you fail to give people any feedback on their performance, many will think they're doing a fine job, even if they're doing a terrible job. (this is a big problem over at the d.p. mothership, but we probably shouldn't be getting into that can of worms now.) after all, if they didn't think they were doing fine, they would change what they were doing, so they _could_ be doing fine. so you absolutely need to give them good and fast feedback.
One solution is to have every page looked at by at least two proofers.
that seems straightforward, but it has some gotchas.
That seems straightforward but it has some gotchas.
right. :+)
If every proofer knows that every page is going to be looked at by someone else, will they proof that page differently than if they intended it to be one-and-done?
it's likely. so you'd need to assume it, and work from there.
I think they might. Knowing the underlying mechanism can undermine the process.
well, you must assume people "know the underlying mechanism", because you want to be open and transparent about it with them. there's really no other option when you're working with volunteers.
Also, what if the second proofer is one of the four mentioned earlier?
or what if they both were?
There is a good chance that many of the errors would slip through.
It's easy for me to change the site code to force two looks at every
right. page,
and I'll probably do that, perhaps even with a project in progress.
doesn't matter. even after two forced looks, some errors will remain.
A down side to the "every page looked at by at least two proofers" approach is specific to fadedpage: that there are only a dozen or so active proofers of the 60 or so registered users. The double-look algorithm adds about 35% to the number of page looks on a project.
doesn't matter. there's no need for any haste on the books coming out...
A better solution that just a double-look is to actually instantiate Confidence in Proofer (CiP).
For these four proofers, the system could schedule a second look at their pages even if they check the "this is done" box when done
i was afraid you were gonna say that. and it's absolutely the wrong approach. proofing.
It would give them plenty of diffs to look at, and they would be expected to look at those diffs that show some correction was made.
well, it'd be better just to inform them and educate them in the first place, rather than impose an "expectation" on them that informs them (indirectly) and forces them to educate themselves (again, in a very indirect fashion)...
If diffs were not checked, then their access to new pages would be reduced. The kind of proofer who checks diffs, learns, and continues to contribute is exactly what is needed.
well, yeah, maybe... but you're assuming a real luxury of an overabundance of volunteers, and a willingness to throw a good number of them away as "not exactly what is needed". it's better to figure out how to find a use for _all_ volunteers.
I believe for a roundless system to work, there has to be a reliable mechanism for stopping a page as done.
d'oh. there has been complete agreement that that is the issue from day 1.
I also believe that to have a reliable way to make that determination, some form of Confidence in Proofer needs to be in place.
some people have held that belief, yes. i think it's unobtainable, and wrongheaded, and basically a dead end. even if you get a rudimentary version, it won't turn out to be useful...
Therefore, CiP, which is important, and page tweets, which are useful and fun, are currently my main coding efforts at fadedpage.
yeah, well, you'll be coming back sometime down the line and saying "those who wish to say 'i told you so' can chime in now, rightfully"... the thread has more, on confidence-in-proofer, but i'm not gonna waste any more of my time dealing with that flawed concept... -bowerbird

If every proofer knows that every page is going to be looked at by someone else, will they proof that page differently than if they intended it to be one-and-done?
Under the *current* DP system everyone knows that everything being done is also going to be worked on by about six other people. The hard part then is getting anyone to feel "ownership" about anything -- particularly about getting something *done*. Automatic scoring of proofing efforts and automatic reporting back of scannos that slip by that other people find -- without making a "big deal value judgement" about those that slip by might make a positive contribution. Getting more people who care to read the finished or almost finished product and providing an easy and convenient way to give feedback on bugs found, or god forbid to be able to actually fix those bugs directly might also make a contribution.

On Tue, Feb 9, 2010 at 9:39 AM, Jim Adcock <jimad@msn.com> wrote:
Under the *current* DP system everyone knows that everything being done is also going to be worked on by about six other people. The hard part then is getting anyone to feel "ownership" about anything -- particularly about getting something *done*.
Jim, this is unfair to DP and to those of us who work there. I'm a high-count proofer in P3. I do care about finishing off books ... indeed, I'm a member of P3 Archers, a team that works to "shoot down" books that are almost-but-not-quite finished (we completed 27 projects last week). I did my share of slogging on the Baburnama, a nightmare project with lots of diacritic-spattered Turki, as well as other mouldie oldies. I also care about the quality of my work. I can't be sure that a formatter or a PPer is going to catch an error if I miss it in P3. I spellcheck and if I'm not sure of a word, look it up in OneLook online dictionary. I'm not sure that the current system at DP is the best possible, but I also know that various groups are experimenting with other workflows. It's a Rube Goldberg contraption in some ways, but it does keep putting out the books: more than 17,000 at last count. -- Karen Lofstrom

Under the *current* DP system everyone knows that everything being done is also going to be worked on by about six other people. The hard part then is getting anyone to feel "ownership" about anything -- particularly about getting something *done*.
Jim, this is unfair to DP and to those of us who work there. I'm a high-count proofer in P3. I do care about finishing off books ...
I have two books, highly requested, in DP that I spent about 40 hours each getting them into DP and where they have been moldering for almost a year now. They are "stuck" and there is no way to get them unstuck and the txt has been "ready to go" from almost the beginning. Again, the txt part, including P1, P2, P3 is the easy part of the problem, and is working relatively well compared to the rest of the DP process. This compares, for example, that I can personally crank out a book -- perhaps not quite as good as DP -- taking about the same 40 hours *total*, and can get it done including HTML in less than a month elapsed time including god knows how many family emergencies intruding on my efforts. I *try* to take ownership of these books at DP but am prevented in doing so by the system and the management -- god knows if I were allowed to do so I would personally have finished them off a half a year ago! A fundamental part of the DP problem is that the "design" (if you want to call it that) of the queuing system doesn't work. Another part of the problem, frankly, is the disproportionate amount of time spent on books that are very complicated, poorly scanned, and not very good choices to begin with -- meaning simply that they are books when all is said and done that not that many people are going to want to read. Under the current system bad ideas are allowed to consume a disproportionate amount of everyone's time and effort -- but isn't that true of life in general!

On Tue, Feb 9, 2010 at 4:27 PM, James Adcock <jimad@msn.com> wrote:
Again, the txt part, including P1, P2, P3 is the easy part of the problem, and is working relatively well compared to the rest of the DP process.
Since when has it been okay to toss out italics, indentation of poetry and proper footnotes in the text file? -- Kie ekzistas vivo, ekzistas espero.

"James Adcock" <jimad@msn.com> writes:
many family emergencies intruding on my efforts. I *try* to take ownership of these books at DP but am prevented in doing so by the system and the management -- god knows if I were allowed to do so I would personally have finished them off a half a year ago! A fundamental part of the DP problem is that the "design" (if you want to call it that) of the queuing system doesn't work.
I also consider this a serious defect. IMO, it must be possible, if someone want to work on a book, to "activate" it (= unlock it from a waiting state).
Another part of the problem, frankly, is the disproportionate amount of time spent on books that are very complicated, poorly scanned, and not very good choices to begin with -- meaning simply that they are books when all is said and done that not that many people are going to want to read. Under the current system bad ideas are allowed to consume a disproportionate amount of everyone's time and effort --
I'm always wondering why people work on books they are not interested in...
but isn't that true of life in general!
Probably ;) -- Karl Eichwalder

Re:
Getting more people who care to read the finished or almost finished product and providing an easy and convenient way to give feedback on bugs found, or
DP-US and DP-Canada both have a smooth-read facility, with instructions on how report problems.
god forbid to be able to actually fix those bugs directly might also make a "contribution.
Allowing the hoi polloi, as it were, to "fix bugs" is a sure-fire way of introducing errors. I occasionally have to disallow an errata-reported error because the reporter wasn't aware that a word was, in fact, valid. For example, "ancle" is a valid, albeit archaic, variant of "ankle", and is not an error. But, if it's a typo/scanno for "uncle", it is. I've also handled reported errors where the error was real, but the suggested correction was incorrect. Al ----- Original Message ----- From: "Jim Adcock" <jimad@msn.com> To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org> Sent: Tuesday, February 09, 2010 11:39 AM Subject: [gutvol-d] Re: rfrank reports in
If every proofer knows that every page is going to be looked at by someone else, will they proof that page differently than if they intended it to be one-and-done?
Under the *current* DP system everyone knows that everything being done is also going to be worked on by about six other people. The hard part then is getting anyone to feel "ownership" about anything -- particularly about getting something *done*.
Automatic scoring of proofing efforts and automatic reporting back of scannos that slip by that other people find -- without making a "big deal value judgement" about those that slip by might make a positive contribution.
Getting more people who care to read the finished or almost finished product and providing an easy and convenient way to give feedback on bugs found, or god forbid to be able to actually fix those bugs directly might also make a contribution.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

DP-US and DP-Canada both have a smooth-read facility, with instructions on how report problems.
Below describes how one might get started doing smooth reading if anyone cares to see in part what the problem might be: http://www.pgdp.net/wiki/Smooth-reading_FAQ If there were a tie-in between PG and DP to let people in general know when SR is happening DP might get more SRs. A list of books "on deck" if you will and how to get them.
Allowing the hoi polloi, as it were, to "fix bugs" is a sure-fire way of introducing errors. I occasionally have to disallow an errata-reported error because the reporter wasn't aware that a word was, in fact, valid. For example, "ancle" is a valid, albeit archaic, variant of "ankle", and is not an error. But, if it's a typo/scanno for "uncle", it is. I've also handled reported errors where the error was real, but the suggested correction was incorrect.
This would be a problem that DP already has because in my experience many a P3 "knows" so well how to do their job that they never bother to double-check what it is the author actually wrote or that which the publisher actually published -- which in practice turns them into gold plated SRs. Don't get me wrong, DP has many excellent dedicated people at all levels, including all levels of P1, P2, P3 -- its just that moving up the ranks doesn't necessarily mean people are actually getting any better at what they are doing. And the queuing system guarantees that the upper level "experts" are going to be overloaded.

On Tue, Feb 9, 2010 at 12:04 PM, James Adcock <jimad@msn.com> wrote:
This would be a problem that DP already has because in my experience many a P3 "knows" so well how to do their job that they never bother to double-check what it is the author actually wrote or that which the publisher actually published -- which in practice turns them into gold plated SRs.
And how would you know this? Long experience as a formatter or PPer? -- Karen Lofstrom

And how would you know this? Long experience as a formatter or PPer?
It's not hard to see, actually, when a P3 or others make changes which don't match the page images. You just have to actually look at one and then the other. I've seen many great proofers at all of the P1, P2, and P3 levels. I have also seen "well reputed" P3's who turn out results that don't match the page images. The text they create scans perfectly fine, it's just that it's not what the author wrote -- particularly when it comes to punctuation. Best way to get great results is to have people working on a text and an author they absolutely love, not just cranking out the numbers.

And how would you know this? Long experience as a formatter or PPer?
It's not hard to see, actually, when a P3 or others make changes which don't match the page images.
I just checked my previous claims about the problems with P3 (for example) against a pretty straight-forward text. I reviewed 200 pages from that text. P3's made 38 changes on those pages. Of these changes 7 changes represented a positive contribution towards making the txt correct. Of those positive changes about a half could easily be found by a simple tool like guiguts. 10 of the changes introduced by the P3's were negative changes -- changes that moved the text to a less perfect state. The remaining 21 changes were basically "null changes" relating to established DP procedure, which neither really made the txt any better nor any worse. Most of the negative changes were relating to punctuation, as I previously claimed. Again, it's really hard for the human mind to accept that the best thing to do when things aren't broken is to leave them alone -- people really want to make a "positive contribution" by changing things. By my calculation DP is cranking out an average of 194 books a month -- which is impressive. But consider some of the upper level queue times: 2000 books stuck in P3 = 10.3 Months stuck in P3 2840 books stuck in F2 = 14.6 Months stuck in F2 2562 books stuck in PP = 13.2 Months stuck in PP Total 38 Months, about 3 years waiting on these higher level queues, which means it takes about three and a half years in total for a book to get through DP nowadays? -- And getting longer every day. Seems "pretty obvious" to me looking at the DP "red bar" graph at http://www.pgdp.net/c/activity_hub.php that the P3, F2 and PP efforts are "out of control." Which doesn't mean that one should admonish the troops to do better. Rather, it means that the process needs to be redesigned to fit the resources actually available -- somehow you have to move more people into the roles currently labeled "P3, F2, and PP" or you have to redesign things to make their jobs MUCH faster and easier, or you have to redesign the process, or redesign the goals of the organization. I'm not saying this is good or this is bad -- I'm just saying that this is obvious! You cannot indefinitely run an organization that takes more orders in the front door than you ship out the back door -- no matter how big hearted you are.

I just checked my previous claims about the problems with P3 (for example) against a pretty straight-forward text. I reviewed 200 pages from that text. P3's made 38 changes on those pages. Of these changes 7 change represented a positive contribution towards making the txt correct. Of
On Tue, Feb 9, 2010 at 8:59 PM, Jim Adcock <jimad@msn.com> wrote: those positive changes about a half could easily be found by a simple tool like guiguts. 10 of the changes introduced by the P3's were negative changes -- changes that moved the text to a less perfect state. I'd have to look at them before trusting you on this, as you seem to have an extremely negative, fault-finding attitude towards DP. I wonder if you'd count my occasional bracketed comments, such as [**P3--seems to be a mistake in the original; s/b ;], as errors.
The remaining 21 changes were basically "null changes" relating to established DP procedure, which neither really made the txt any better nor any worse.
Nonetheless, they were useful to the formatters and PPers as making the text predictable.
... which means it takes about three and a half years in total for a book to get through DP nowadays? -- And getting longer every day
None of us likes that! Yes, the current round system is broken. It produces better texts than the old 2-round system. Some of the second-round proofers in those days wanted page count and didn't give a #$%@$#% about accuracy. The results were as dismal as you would expect. However, we've now producing very good texts at an enormous cost. We're discussing further changes. It doesn't particularly help, when one is drowning and flailing about for a handhold, to have a bystander jumping up and down, shouting, "You're drowning, you idiot!" -- Karen Lofstrom

However, we've now producing very good texts at an enormous cost. We're discussing further changes. It doesn't particularly help, when one is drowning and flailing about for a handhold, to have a bystander jumping up and down, shouting, "You're drowning, you idiot!"
If I'm a bystander it is because the texts that I have submitted to DP which I thought I would be working on have been frozen in the queues by DP for the last year. What you suggest instead is that I also jump into the pool and start flailing around with you. Been there done that, got tired of it, climbed back out of the pool. Flailing harder or faster or throwing more people into the pool really isn't going to help. I have made what I consider many positive suggestions, any of which simply invoke anger and defensiveness on the part of DP'ers: One of which is post the text after P3 rather than waiting to finish PP. This would make about an additional 4,000 texts available on PG. If one counts volunteer hours worth $10/hr this represents an "unfinished inventory" of about $2,000,000. If you value PG downloads similar to Amazon's minimum cost of $1 a book, then these 4,000 would generate about $150,000 a year in additional value to society. Other obvious suggestions would be to adjust your "experience" thresholds and testing methods for admittance to P3, F2, PP in order to allow a bit more people into these areas and see how much it *really* hurts your quality and productivity -- or not! Fundamentally it is the unbalanced number of people allowed into the upper rounds (or rather not allowed into the higher rounds) which is killing you. Further, any tools that you can offer P3, F2, or PP to make their lives easier would help you greatly. Another suggestion I have made is to do what many other commercial digitizers of text using human beings do: Run two humans in parallel on the same text and then diff the results. If you get a diff on some page run a third person and vote the results. If you were to double up on the P1 and P2 efforts like this that would help the P3 queue. If you doubled up the F1 efforts that would help the F2 queue. Don't know how to help the PP queue except I don't understand why you allow almost finished texts to be stuck moldering in the hard-drives of one PP'er so long. If a PP'er just can't get it done -- take it away and assign it to someone else. Doesn't matter how good or experienced a PP'er is if they just can't get it done. Another suggestion is to auto score proofers and formatters efforts and automatically assign them to the place in your process where their level of abilities will do the most good -- or at least the least damage. It is easy to auto score the P1, P2, and F1 efforts -- it is basically the ratio of the number of fixes that they make divided by the number of fixes made on the same pages by the successive round. Have the P3s and F2s "retest" on a P2 or F1 round occasionally so that you can autoscore whether they still know what they are doing or not. Another suggestion would be to update the toolset being used to make them more fun, less time-wasting, and less tweaky. Simple common tasks ought to be simple, unpainful, and fast. Allowing higher rez page scans for the people with the bandwidth to handle them would make all the rounds easier. Another suggestion would be to get PG to allow one to query on how many downloads various texts are getting, so that people who are submitting texts to DP which aren't getting read might get some feedback about what their actions is really accomplishing, or not. Modifying bowerbird's suggestions slightly, there *are* at least some texts that fit pretty well into template forms, such as some simple novels. Perhaps an automated or semi-automated tool for turning these simpler texts into HTML quickly? Another obvious suggestion is that there are too many texts in the world to take them all on. Are the readers of PG really interested in "Annals of the Annual Proctology Meeting of 1847" ? Is there at least some way to try to discourage really bad ideas? Looking at the actual text of the English language submissions in P1 right now it looks to me that about half of them have a reasonable chance of being read. Is there any way to more actively promote the acquisition and prioritizing of texts that are generally recognized as being "better than average" aka "famous" or at least "well known"? Another obvious suggestion would be to empower PM's to have at least one active project where if that project gets stuck they are allowed to take whatever actions necessary to get it unstuck....

On Wed, Feb 10, 2010 at 8:57 PM, Jim Adcock <jimad@msn.com> wrote:
One of which is post the text after P3 rather than waiting to finish PP.
To which I pointed out that this would in many cases result in the posting of severely deficient texts. Formatting is important.
Don't know how to help the PP queue except I don't understand why you allow almost finished texts to be stuck moldering in the hard-drives of one PP'er so long. If a PP'er just can't get it done -- take it away and assign it to someone else. Doesn't matter how good or experienced a PP'er is if they just can't get it done.
Because sometimes it may be worth letting a text molder rather then preemptorially ripping it out of someone's hands and annoying the hell out of them.
Perhaps an automated or semi-automated tool for turning these simpler texts into HTML quickly?
Is guiguts not quick enough for you? This is a fairly simple tool problem.
Another obvious suggestion is that there are too many texts in the world to take them all on. Are the readers of PG really interested in "Annals of the Annual Proctology Meeting of 1847" ?
It's easy to come up with a rhetorically stupid title. But if you pulled a real title, then we could actually discuss the audience and why someone would upload that.
Is there any way to more actively promote the acquisition and prioritizing of texts that are generally recognized as being "better than average" aka "famous" or at least "well known"?
That presumes that that should be our goal. Some of the works I'm proudest of are works where the PG edition is the best in the world. Sure, more people may read the Canterbury Tales, but every who reads our edition of Stephen Hawes's "A Joyful Meditation of the Coronation of King Henry the Eighth" is thrilled that we have it, because the alternative was deciphering the blackletter originals and trying to figure out the lost parts yourself. Augustan Reprint Society works are a large class of works I've done where they have some scholarly interest, but the reader will only find facsimiles outside of PG. On the other hand is stuff like "1931: A Glance at the Twentieth Century" by Henry Hartshorne. It is none of those things; it's just a fun work to read, even if that fun comes at its own expense. I don't think anyone who worked on it is the least bit unhappy about that. -- Kie ekzistas vivo, ekzistas espero.

Hi Everybody, I have been following the thread "rfrank reports in". [Yes, BB its been hijacked] It seems obvious to all that the DP system has severe deficiencies. The question is how to help. Which leads to the question of what is flawed. It is obvious that the system after P3 is evidently to complex. Furthermore, the method of creating the perfect ebook. This has resulted in that there are evidently to few persons that can be trusted with this complexity. The method in general is not the problem, but the rules that have to be abided by!! I have suggest in the past other alternatives which are by far simpler and would produce the required results. I would implement a system if i had the time, furthermore it would be a one person operation. DP has a hugh amount of person-power which the could use more efficiently. As we all have noted. But, until the ones who have the say over at DP are willing to simplify their system the problems will persist. The formating/transcription rules required by DP have been developed over time, yet the where evidently added in in an ad-hoc manner. Any system over time should be revamped and streamlined. Optimized if you will. Sure a few tools need to be rewritten, but the basic frame should already be there so that should not propose a great ordeal. The other questions that remain are what is a perfect book? Or, What is a predictable book? regards Keith.

To which I pointed out that this would in many cases result in the posting of severely deficient texts. Formatting is important.
Because sometimes it may be worth letting a text molder rather then
OK, but I can also point to texts that were almost "good to go" before they went into DP, only to molder indefinitely there. Is there some way to make a decision on this one way or another. How about letting the PM make the decision whether or not to post a "preliminary version" to PG? preemptorially ripping it out of someone's hands and annoying the hell out of them. OK, but how and when do you decide that the PP has actually moved on in life and is not really willing to finish up the book to which others have in good faith contributed their blood sweat and tears in the hopes of getting an honest to god book? Not to mention the possibility of a PP not working in good faith?
Is guiguts not quick enough for you? This is a fairly simple tool problem.
It's easy to come up with a rhetorically stupid title. But if you
Tried it previously and didn't find any value in it. I will take a look at it again. pulled a real title, then we could actually discuss the audience and why someone would upload that. Pick any title active in the rounds right now. Based on the best statistics I can find on PG usage, which is actually from IA, the most popular books from PG get read literally 100,000 times more often than the least read books. Now, it is hard to find a book that is going to be that popular. But it is easy to find a good book which will get read literally 40x more often than the books in DP right now, as well as being at least several times faster and a easier to create.
Is there any way to more actively promote the acquisition and prioritizing of texts that are generally recognized as being "better than average" aka "famous" or at least "well known"?
That presumes that that should be our goal. Some of the works I'm proudest of are works where the PG edition is the best in the world. Sure, more people may read the Canterbury Tales, but every who reads our edition of Stephen Hawes's "A Joyful Meditation....
Is it possible to split the queues and the efforts into "esoterica" vs. "books that will be actively read?" Right now the "books that will be actively read" I am afraid are stuck in the queue behind "books that no one is actually willing to work on." I went there recently to try to help and it looked like "the powers that be" were trying to force through books that really no one wants to work on -- books that were really hard and not very interesting even to the people who volunteer their time to DP. You can't force people to work on things they don't want to work on. Either they work on texts that they want to work on, or if DP is not willing to present any of those, they they go on with their lives, or maybe, like in my case, they "route around damage" and work on books outside of DP. The problem is NOT that there is "esoterica" vs. "books that will be actively read" -- the problem is that the "esoterica" takes so much time and effort compared to "books that will be actively read" that "esoterica" ends up swamping the other categories. Are you really saving a book if you pickle it for posterity without it getting read? Isn't that like locking up a ballerina's shoes in order to preserve ballet? Or locking up an artists paint and brushes in order to preserve art? To my taste books exist while they are being read. Otherwise they fail to exist -- beyond little magnetic domains stuck somewhere on the internet. A simple answer would be to put in separate queues for the differing levels of difficulty and/or categories of books. Then people who want to work on esoterica can do so without impacting people who don't.

On Fri, Feb 12, 2010 at 1:47 PM, Jim Adcock <jimad@msn.com> wrote:
OK, but how and when do you decide that the PP has actually moved on in life and is not really willing to finish up the book to which others have in good faith contributed their blood sweat and tears in the hopes of getting an honest to god book? Not to mention the possibility of a PP not working in good faith?
That's not a problem to be solved ranting; it's a problem to be solved by study of the statistics and talking to the PPers.
Is it possible to split the queues and the efforts into "esoterica" vs. "books that will be actively read?" Right now the "books that will be actively read" I am afraid are stuck in the queue behind "books that no one is actually willing to work on." I went there recently to try to help and it looked like "the powers that be" were trying to force through books that really no one wants to work on -- books that were really hard and not very interesting even to the people who volunteer their time to DP.
This is the funny thing; there's no connection between books that will be actively read, and books people want to work on. What books would be actively read: Euclid, Newton's Principia, the Oxford English Dictionary. We've had scans of the OED for years; no one has been willing to attack it. We can probably come up with a dozen usable scans of Euclid; no one is currently working on getting PG a complete copy of Euclid, because it's a total pain to work on. But you take some moldy old historical fiction or better yet some sci-fi story that hasn't been reprinted since it was first published, and they will rocket through DP.
The problem is NOT that there is "esoterica" vs. "books that will be actively read" -- the problem is that the "esoterica" takes so much time and effort compared to "books that will be actively read" that "esoterica" ends up swamping the other categories.
Bullshit. How long do you think the OED would take? That's a book that will be actively read. Why did "Dryden's Works (13 of 18): Translations; Pastorals" take two months to go through P2? If you're classifying the complete works of Dryden as esoterica, then what on Earth are you classifying as books that will be actively read? Certainly not the historical trash fiction that does blow through DP. -- Kie ekzistas vivo, ekzistas espero.

On Fri, Feb 12, 2010 at 2:22 PM, David Starner <prosfilaes@gmail.com> wrote:
Dictionary. We've had scans of the OED for years; no one has been willing to attack it. We can probably come up with a dozen usable
Not exactly true. I have a clearance on it, and have a fascicle prepped and at DP. The holdup is that I have yet to come up with a good markup for proofing that can be machine transformed into various dictionary formats. Straight TEI is too big, and likely to lead to inconsistencies. I refuse to start something this big without a decent plan for the final output. Granted, once started, it will probably take decades to work through DP... -R C (Who is somewhat easily distracted, and has been working on other projects.)

Robert Cicconetti <grythumn@gmail.com> writes:
Not exactly true. I have a clearance on it, and have a fascicle prepped and at DP. The holdup is that I have yet to come up with a good markup for proofing that can be machine transformed into various dictionary formats.
Lame excuse ;) The proofing rounds are easy (and you only see the difficulties, once you actually let the crowd work on it).
Straight TEI is too big, and likely to lead to inconsistencies. I refuse to start something this big without a decent plan for the final output.
I'd recommend to do all "formatting" (= XML tagging) off-site. It would probably the best to use SVN or git/bazar for collaboration. Any idea where we could host such a repository? -- Karl Eichwalder

Google Code <http://code.google.com/intl/en/> On Fri, Feb 12, 2010 at 6:26 PM, Karl Eichwalder <ke@gnu.franken.de> wrote:
Robert Cicconetti <grythumn@gmail.com> writes:
Not exactly true. I have a clearance on it, and have a fascicle prepped and at DP. The holdup is that I have yet to come up with a good markup for proofing that can be machine transformed into various dictionary formats.
Lame excuse ;) The proofing rounds are easy (and you only see the difficulties, once you actually let the crowd work on it).
Straight TEI is too big, and likely to lead to inconsistencies. I refuse to start something this big without a decent plan for the final output.
I'd recommend to do all "formatting" (= XML tagging) off-site. It would probably the best to use SVN or git/bazar for collaboration. Any idea where we could host such a repository?
-- Karl Eichwalder _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

don kretz <dakretz@gmail.com> writes:
Google Code <http://code.google.com/intl/en/>
Why not? ;) I just created http://code.google.com/p/tieck-texts/ and seeded it with 'Briefe an Ludwig Tieck (1 of 4) {fraktur} {type-in}'. I'll update the project comments later. Wondering whether Google will accept this project... -- Karl Eichwalder

On Fri, Feb 12, 2010 at 9:26 PM, Karl Eichwalder <ke@gnu.franken.de> wrote:
Robert Cicconetti <grythumn@gmail.com> writes:
Not exactly true. I have a clearance on it, and have a fascicle prepped and at DP. The holdup is that I have yet to come up with a good markup for proofing that can be machine transformed into various dictionary formats.
Lame excuse ;) The proofing rounds are easy (and you only see the difficulties, once you actually let the crowd work on it).
Not really. The OED uses a predecessor of IPA with some oddball symbols.. at the least I have to come up with a table for those or they'll be all over the place. I started one, need to finish it.
Straight TEI is too big, and likely to lead to inconsistencies. I refuse to start something this big without a decent plan for the final output.
I'd recommend to do all "formatting" (= XML tagging) off-site. It would probably the best to use SVN or git/bazar for collaboration. Any idea where we could host such a repository?
I'm not prepared to abandon the DP workflow, especially for a project of this scale, and considering the amount of markup that will be required. At DP I reasonably assume it'll keep moving, even if I drop off the grid or get hit by a bus. -R C

Here's a point of reference for you. The current Encyclopædia Britannica project in F2 has been there since September. Number of pages: 232 Pages remaining: 24 Pages I've done: 203 Pages other people have done: 5 Some rounds get cherry-picked pretty badly; and OED is not a cherry. Stay away from buses. On Fri, Feb 12, 2010 at 7:16 PM, Robert Cicconetti <grythumn@gmail.com>wrote:
On Fri, Feb 12, 2010 at 9:26 PM, Karl Eichwalder <ke@gnu.franken.de> wrote:
Robert Cicconetti <grythumn@gmail.com> writes:
I'm not prepared to abandon the DP workflow, especially for a project of this scale, and considering the amount of markup that will be required. At DP I reasonably assume it'll keep moving, even if I drop off the grid or get hit by a bus.
-R C _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Fri, Feb 12, 2010 at 11:04 PM, don kretz <dakretz@gmail.com> wrote:
Here's a point of reference for you. The current Encyclopædia Britannica project in F2 has been there since September. Number of pages: 232 Pages remaining: 24 Pages I've done: 203 Pages other people have done: 5 Some rounds get cherry-picked pretty badly; and OED is not a cherry. Stay away from buses.
I might be willing to do a parallel F1 / merge, and automated markup check for F2 skip if I don't have to find a PP in advance. Let me be blunt... I'm easily distracted; doing this kind of markup would drive me nuts quickly and result in orphaned projects. I'll prep the images, run OCR, answer questions, write the code to do the automated checks. But I don't PP or format. -Bob

A little more information. That same project (which is 232 pages, about 1/8 of one volume out of 29 volumes.) It was being P3 proofread from April to November of 2007 (about 7 months). Then it sat in queues with no one working on it from Nov. 2007 to Sept. 2009 (almost two years) except for a brief spell (3 months) when it was in F1. And that was pretty speedy. A new project (such as I'm preparing now, which will be 300+ pages,) will not be quite so fortunate, because now the queues are much longer; and more significantly, there will be many more EB volumes ahead of it when it gets to each queue. So I'd be prepared to spend some time proofing at least (if you don't prefer formatting and PP) so help it along in those brief windows of opportunity (roughly 9-12 months) when it's available to anyone at all. (But given well-established trends, it will probably be much longer.) Fortunately, you'll have lots of time to scan and OCR each project. In fact, I bet you'll be so fortunate as to have a new generation of scanning technology available every couple of projects or so. It may easily take longer to proof, format, and publish the ebook than it took for the original - an acknowledged epic in itself. For sure, it could be re-typeset in a small fraction of the time. On Sat, Feb 13, 2010 at 6:39 AM, Robert Cicconetti <grythumn@gmail.com>wrote:
On Fri, Feb 12, 2010 at 11:04 PM, don kretz <dakretz@gmail.com> wrote:
Here's a point of reference for you. The current Encyclopædia Britannica project in F2 has been there since September. Number of pages: 232 Pages remaining: 24 Pages I've done: 203 Pages other people have done: 5 Some rounds get cherry-picked pretty badly; and OED is not a cherry. Stay away from buses.
I might be willing to do a parallel F1 / merge, and automated markup check for F2 skip if I don't have to find a PP in advance.
Let me be blunt... I'm easily distracted; doing this kind of markup would drive me nuts quickly and result in orphaned projects. I'll prep the images, run OCR, answer questions, write the code to do the automated checks. But I don't PP or format.
-Bob _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Robert Cicconetti <grythumn@gmail.com> writes:
Not really. The OED uses a predecessor of IPA with some oddball symbols.. at the least I have to come up with a table for those or they'll be all over the place. I started one, need to finish it.
You could consider processing it at dp-canada or dp-int--both are UTF-8 enabled.
I'd recommend to do all "formatting" (= XML tagging) off-site. It would probably the best to use SVN or git/bazar for collaboration. Any idea where we could host such a repository?
I'm not prepared to abandon the DP workflow, especially for a project of this scale, and considering the amount of markup that will be required. At DP I reasonably assume it'll keep moving, even if I drop off the grid or get hit by a bus.
That's why I propose to use a public repository. Of course, you would leave a appropriate comment on the project page. Doing TEI tagging page-wise is cumbersome. Doing TEI tagging off-site using your XML editor is much better. -- Karl Eichwalder

On Sat, Feb 13, 2010 at 3:17 AM, Karl Eichwalder <ke@gnu.franken.de> wrote:
Robert Cicconetti <grythumn@gmail.com> writes:
Not really. The OED uses a predecessor of IPA with some oddball symbols.. at the least I have to come up with a table for those or they'll be all over the place. I started one, need to finish it.
You could consider processing it at dp-canada or dp-int--both are UTF-8 enabled.
I have two problems with that. One, I'm not sure all the symbols are in Unicode. Two, just making Unicode available doesn't overcome the problems that these characters are not on any physical keyboards and only the most esoteric software keyboards. Even with Unicode available, if it were pure IPA, I'd go with SAMPA. -- Kie ekzistas vivo, ekzistas espero.

"David" == David Starner <prosfilaes@gmail.com> writes:
David> On Sat, Feb 13, 2010 at 3:17 AM, Karl Eichwalder David> <ke@gnu.franken.de> wrote: >> Robert Cicconetti <grythumn@gmail.com> writes: >> >>> Not really. The OED uses a predecessor of IPA with some >>> oddball symbols.. at the least I have to come up with a table >>> for those or they'll be all over the place. I started one, >>> need to finish it. >> You could consider processing it at dp-canada or dp-int--both >> are UTF-8 enabled. David> I have two problems with that. One, I'm not sure all the David> symbols are in Unicode. This could be managed with replacements of the few (are they few?) missing characters. Two, just making Unicode available David> doesn't overcome the problems that these characters are not David> on any physical keyboards and only the most esoteric David> software keyboards. This could be managed with a character picker, like the greek and hieroglyph popups in the proofing interface. Or some of the tools in some of Don's project comments. Even with Unicode available, if it were David> pure IPA, I'd go with SAMPA. SAMPA might be OK for publcation, and probably for entering too, but for checking (rounds after the first) it requires to know the correspondence OED/SAMPA. Impossible, except for experts. One might however easily build converters from SAMPA and IPA to OED using the conversion software that is running at DP-EU (convert button in the standard interface). Undocumented, but I know it, and I have both the software and part at least of the conversion tables, and can build in minutes any further table needed. Probably it is something that might be experimented at DP-EU: apparently Nikola is maintaining the converter, and adding a table to it is straightforward (it is an ASCII table). If you want, I can start there a project with a few pages. There is however another worse problem: I am not sure that the OED is free from copyright in Canada or Serbia. Carlo

On Sun, Feb 14, 2010 at 4:02 AM, Carlo Traverso <traverso@posso.dm.unipi.it> wrote:
David> I have two problems with that. One, I'm not sure all the David> symbols are in Unicode.
This could be managed with replacements of the few (are they few?) missing characters.
The OED phonetic alphabet, and an incomplete match to various unicode symbols: http://home.comcast.net/~grythumn/oed/
There is however another worse problem: I am not sure that the OED is free from copyright in Canada or Serbia.
Better hope there is some sort of corporate work exception... there were several editors, dozens of subeditors, and hundreds of volunteer readers. Not all of whom appear on the title page, but many are listed. -Bob

Hi Robert, As far as a markup is concerned I would suggest using TeX or XeTeX. For one you can encode all the information we want as you want. Such as \entry, \pronunciation, \meaning, \synonym, etc, you name it. Then either write comands for formating or a TeX script to produce the desired output, or use any other language to process the data. Another way to go is use XML to encode the data and take it from there. Eitherway you have full control of the input data and output. regards Keith Am 12.02.2010 um 20:45 schrieb Robert Cicconetti:
On Fri, Feb 12, 2010 at 2:22 PM, David Starner <prosfilaes@gmail.com> wrote:
Dictionary. We've had scans of the OED for years; no one has been willing to attack it. We can probably come up with a dozen usable
Not exactly true. I have a clearance on it, and have a fascicle prepped and at DP. The holdup is that I have yet to come up with a good markup for proofing that can be machine transformed into various dictionary formats. Straight TEI is too big, and likely to lead to inconsistencies. I refuse to start something this big without a decent plan for the final output.
Granted, once started, it will probably take decades to work through DP...

You might want to work something out with these guys<http://www.longnow.org/clock/> to keep track of your project logs after you're gone.
Am 12.02.2010 um 20:45 schrieb Robert Cicconetti:
Granted, once started, it will probably take decades to work through
DP...
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Fri, Feb 12, 2010 at 8:47 AM, Jim Adcock <jimad@msn.com> wrote:
OK, but I can also point to texts that were almost "good to go" before they went into DP, only to molder indefinitely there. Is there some way to make a decision on this one way or another. How about letting the PM make the decision whether or not to post a "preliminary version" to PG?
The poll that's up at DP right now has the respondents just about evenly split on this issue. I would be OK with doing it, but I also understand those who feel that the "preliminary" posting might hang around for years, displacing the final, polished, ACCURATE product.
Is it possible to split the queues and the efforts into "esoterica" vs. "books that will be actively read?"
No. That's like recommending that publishers solve their financial problems by only printing best-sellers. Some books that YOU think are esoterica might actually be of great interest to a small but appreciative community, such as scholars the world over. Take, for example, the Baburnama, the memoirs of Babur, the Turki conqueror of northern South Asia and founder of the Mughal dynasty, as translated by Beveridge. Fiendishly difficult text, took a year to get through P3, will probably take a lot of time in F1 and F2 and PP, a real slog ... but it's an essential work in South Asian history and I'm sure that it will be of great use to students and scholars once finished. I don't regret the time I spent on it.
I went there recently to try to help and it looked like "the powers that be" were trying to force through books that really no one wants to work on -- books that were really hard and not very interesting even to the people who volunteer their time to DP.
There's no forcing going on. The policy from Day One has been that we work on what the content providers submit. Sometimes works that look enticing or valuable to them aren't appealing to the proofers, and then take a long time to wend their way through the system. (Some texts, like Greg Week's science fiction stories, zip through in days.) The problem is that the mouldie oldies clog the queues. There have been quite a few proposals for changing the queue system and the round system, and some experiments are running right now. We'll see what happens. DP made a HUGE change when it moved to five rounds rather than two, and I think it will be able to change again. -- Karen Lofstrom You can't force people to work on things they don't want to work on. Either they work on texts that they want to work on, or if DP is not willing to present any of those, they they go on with their lives, or maybe, like in my case, they "route around damage" and work on books outside of DP.
The problem is NOT that there is "esoterica" vs. "books that will be actively read" -- the problem is that the "esoterica" takes so much time and effort compared to "books that will be actively read" that "esoterica" ends up swamping the other categories.
Are you really saving a book if you pickle it for posterity without it getting read? Isn't that like locking up a ballerina's shoes in order to preserve ballet? Or locking up an artists paint and brushes in order to preserve art? To my taste books exist while they are being read. Otherwise they fail to exist -- beyond little magnetic domains stuck somewhere on the internet.
A simple answer would be to put in separate queues for the differing levels of difficulty and/or categories of books. Then people who want to work on esoterica can do so without impacting people who don't.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

There's no forcing going on. The policy from Day One has been that we work on what the content providers submit. Sometimes works that look enticing or valuable to them aren't appealing to the proofers, and then take a long time to wend their way through the system. (Some texts, like Greg Week's science fiction stories, zip through in days.)
Works that CP'ers submit which are stuck on the queues AREN'T being worked on. People who volunteer for DP are forced to work on things not stuck on queues. That IS the forcing going on. Work that progresses slowly through the Proofing rounds aren't really the problem. The problem is more works that get stuck in the formatting rounds and the PP rounds. What I've seen stuck in the proofing rounds has sections, such as huge sections of publishers ads, or indexes, which most Proofers get tired of pretty quick -- especially when the work is classified as "Easy." I would question the judgment of including publishers' ads when they aren't even numbered pages nor relate to the subject matter. Let's try to break this down again in a way that SHOULDN'T be controversial: 0) Premise: DP people ARE acknowledging that having books stuck on queues 3.5 years is not a good thing. If this is NOT a good thing, then SOMETHING has to change. If one wants to change the queuing times there is really ONLY a couple things fundamentally that one can change: 1) You can reduce that rate at which content is placed onto the queues. That implies SOME kind of principle of selection. The principle right now is "First Come First Serve." I suggest this is not a good thing for several reasons: Books may be put on the queue that people really don't want to work on. Books may be put on the queue that people really don't want to read. And books may be put on the queue that take time and energy disproportional to the societal benefit to be gained from that book compared to some other books. Note there are about 50 million books available worldwide that could be worked on by DP, compared to 2500 roughly a year created by DP, implying a queuing time for books in general of 20,000 years -- not including those books that will have risen to the public domain in those 20,000 years! Another way of saying this is that the selection process used to decide which books get "rescued" by DP is on the order of 1 book in 10,000 gets saved. Now, if only one book in 10,000 gets saved, should this be "at random" or should there be some kind of selection process -- even if it were only that the DP volunteers who are going to do the work vote on what gets put on the queue? 2) You can increase the rate at which content is taken off the queues. This requires placing more resources at those places in the queues where things are getting bogged down, which are P3, F2, and PP. To place more resources at these places requires at least SOME tweaking of DP's current system of "technological high priesthood" and would require getting over DP's current idea that somehow they are creating "perfect books" [which they certainly are NOT doing!] 3) You can increase productivity by improving tools -- particularly tools helping P3, F2, and PP. Producing tools that help P1 is pretty easy, as many people have suggested, but, it is actually NOT obvious that improving tools for P1 would prove to be helpful to DP overall! Making P1 faster and easier without changing the current rules of "technological high priesthood" will actually only make the queuing problems more extreme.

Speaking as a Whitewasher (and probably for the other WWers, too), I have absolutely no interest in posting a "preliminary" version of something if a "revised" version is going to appear in a few days/weeks/months, requiring me to re-do the posting process. Ditto for posting a text-only version if an HTML version is in the works. My PG priorities are my own productions first, followed by WWing, then Errata and Reposts. My own productions are not going to be allowed to suffer just because someone is in a rush to get a preliminary version out the door. I can always create another priority--"No Rush Whatsover". In short, it's MY time I volunteer to PG, and it's not yours to waste. Al ----- Original Message ----- From: "Jim Adcock" <jimad@msn.com> To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org> Sent: Friday, February 12, 2010 10:47 AM Subject: [gutvol-d] DP: was rfrank reports in
To which I pointed out that this would in many cases result in the posting of severely deficient texts. Formatting is important.
OK, but I can also point to texts that were almost "good to go" before they went into DP, only to molder indefinitely there. Is there some way to make a decision on this one way or another. How about letting the PM make the decision whether or not to post a "preliminary version" to PG?
Because sometimes it may be worth letting a text molder rather then preemptorially ripping it out of someone's hands and annoying the hell out of them.
OK, but how and when do you decide that the PP has actually moved on in life and is not really willing to finish up the book to which others have in good faith contributed their blood sweat and tears in the hopes of getting an honest to god book? Not to mention the possibility of a PP not working in good faith?
Is guiguts not quick enough for you? This is a fairly simple tool problem.
Tried it previously and didn't find any value in it. I will take a look at it again.
It's easy to come up with a rhetorically stupid title. But if you pulled a real title, then we could actually discuss the audience and why someone would upload that.
Pick any title active in the rounds right now. Based on the best statistics I can find on PG usage, which is actually from IA, the most popular books from PG get read literally 100,000 times more often than the least read books. Now, it is hard to find a book that is going to be that popular. But it is easy to find a good book which will get read literally 40x more often than the books in DP right now, as well as being at least several times faster and a easier to create.
Is there any way to more actively promote the acquisition and prioritizing of texts that are generally recognized as being "better than average" aka "famous" or at least "well known"?
That presumes that that should be our goal. Some of the works I'm proudest of are works where the PG edition is the best in the world. Sure, more people may read the Canterbury Tales, but every who reads our edition of Stephen Hawes's "A Joyful Meditation....
Is it possible to split the queues and the efforts into "esoterica" vs. "books that will be actively read?" Right now the "books that will be actively read" I am afraid are stuck in the queue behind "books that no one is actually willing to work on." I went there recently to try to help and it looked like "the powers that be" were trying to force through books that really no one wants to work on -- books that were really hard and not very interesting even to the people who volunteer their time to DP. You can't force people to work on things they don't want to work on. Either they work on texts that they want to work on, or if DP is not willing to present any of those, they they go on with their lives, or maybe, like in my case, they "route around damage" and work on books outside of DP.
The problem is NOT that there is "esoterica" vs. "books that will be actively read" -- the problem is that the "esoterica" takes so much time and effort compared to "books that will be actively read" that "esoterica" ends up swamping the other categories.
Are you really saving a book if you pickle it for posterity without it getting read? Isn't that like locking up a ballerina's shoes in order to preserve ballet? Or locking up an artists paint and brushes in order to preserve art? To my taste books exist while they are being read. Otherwise they fail to exist -- beyond little magnetic domains stuck somewhere on the internet.
A simple answer would be to put in separate queues for the differing levels of difficulty and/or categories of books. Then people who want to work on esoterica can do so without impacting people who don't.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Fri, 12 Feb 2010, Al Haines (shaw) wrote:
Speaking as a Whitewasher (and probably for the other WWers, too), I have absolutely no interest in posting a "preliminary" version of something if a "revised" version is going to appear in a few days/weeks/months, requiring me to re-do the posting process. Ditto for posting a text-only version if an HTML version is in the works.
The proposal isn't to "post" to PG at all, but to something like preprints.readingroo.ms but entirely automated. -- Greg Weeks http://durendal.org:8080/greg/

I agree that's what has been said in discussions on the DP forums. I would argue that intent was not clear from what's been been posted on this list.
From what I've seen, it's hard to stay focused on one concept, because everyone starts dragging in their own concerns on marginally related topics and making those the main focus.
--Andrew On Fri, 12 Feb 2010, Greg Weeks wrote:
On Fri, 12 Feb 2010, Al Haines (shaw) wrote:
Speaking as a Whitewasher (and probably for the other WWers, too), I have absolutely no interest in posting a "preliminary" version of something if a "revised" version is going to appear in a few days/weeks/months, requiring me to re-do the posting process. Ditto for posting a text-only version if an HTML version is in the works.
The proposal isn't to "post" to PG at all, but to something like preprints.readingroo.ms but entirely automated.

This is very interesting viewed from both sites. We have one spokesman for PG suggesting that, for the purposes of increasing the rate of increasing the stock, using text at some level of markup sophistication (which I seem to remember strongly featured text but not HTML), there was some possibility of room for flexibility (or something like that.) There immediately erupted on DP two substantially differing (mis)interpretations of what this might mean. Both of them have received responses I'd characterize as mostly indifference to revulsion, with a few outliers on both sides. Now we have a second PG spokesman, and the score here seems to be one vote for "maybe" and another for "hell no". Not much basis left for discussion, but we get a lot of productive venting done on both sides.

Yes, after the initial proposal, there were 5 or six other proposals made in the same thread to drag it off in odd courses. As far as I can see there's only one person actually doing anything other than argue. That's hanne_dk and I hope to see an automated script to process DPs intermediate files into something that doesn't look too bad for most texts. It appears it'll never get "official" approval as there's too many people adamantly against doing to to "their" texts. Oh well. Greg Weeks On Fri, 12 Feb 2010, Andrew Sly wrote:
I agree that's what has been said in discussions on the DP forums. I would argue that intent was not clear from what's been been posted on this list.
From what I've seen, it's hard to stay focused on one concept, because everyone starts dragging in their own concerns on marginally related topics and making those the main focus.
--Andrew
On Fri, 12 Feb 2010, Greg Weeks wrote:
On Fri, 12 Feb 2010, Al Haines (shaw) wrote:
Speaking as a Whitewasher (and probably for the other WWers, too), I have absolutely no interest in posting a "preliminary" version of something if a "revised" version is going to appear in a few days/weeks/months, requiring me to re-do the posting process. Ditto for posting a text-only version if an HTML version is in the works.
The proposal isn't to "post" to PG at all, but to something like preprints.readingroo.ms but entirely automated.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Greg Weeks http://durendal.org:8080/greg/

Well, to close the circle, if they were posted, they would be (I say this advisedly) *perfect* fodder for, say, an offline utility program to run automated checks and do basic formatting. I bet in most cases one person could whip up a high-quality post-PP equivalent in, say, a day or two. On Fri, Feb 12, 2010 at 5:26 PM, Greg Weeks <greg@durendal.org> wrote:
Yes, after the initial proposal, there were 5 or six other proposals made in the same thread to drag it off in odd courses. As far as I can see there's only one person actually doing anything other than argue. That's hanne_dk and I hope to see an automated script to process DPs intermediate files into something that doesn't look too bad for most texts. It appears it'll never get "official" approval as there's too many people adamantly against doing to to "their" texts. Oh well.
Greg Weeks
On Fri, 12 Feb 2010, Andrew Sly wrote:
I agree that's what has been said in discussions on the DP forums.
I would argue that intent was not clear from what's been been posted on this list.
From what I've seen, it's hard to stay focused on one concept, because
everyone starts dragging in their own concerns on marginally related topics and making those the main focus.
--Andrew
On Fri, 12 Feb 2010, Greg Weeks wrote:
On Fri, 12 Feb 2010, Al Haines (shaw) wrote:
Speaking as a Whitewasher (and probably for the other WWers, too), I
have absolutely no interest in posting a "preliminary" version of something if a "revised" version is going to appear in a few days/weeks/months, requiring me to re-do the posting process. Ditto for posting a text-only version if an HTML version is in the works.
The proposal isn't to "post" to PG at all, but to something like preprints.readingroo.ms but entirely automated.
_______________________________________________
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Greg Weeks http://durendal.org:8080/greg/
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

"revised" version is going to appear in a few days/weeks/months...
We are not talking about a "revised" version showing up in a couple months. Rather, we are talking about doing a posting which includes HTML 3.5 years later. Verses posting the txt version now rather than later, and thereby increasing the total collection size of PG by 20%. One could argue that this would make the whitewashers job easier rather than harder -- because then Al wouldn't have to put up with random submissions from people like me who give up on DP and "route around damage" [thereby introducing "damage" of our own! :-]

At least one of the discussions going on was exactly the HTML coming a few weeks after the text scenario. This was the go through all the rounds and the PPer posts the text version as soon as it's done and posts the html later. This didn't seem like a terribly useful approach to me as the html version of the text is typically NOT where the bottleneck is at DP. Of course there was at least five other aproaches being discussed in the thread. Greg Weeks On Sat, 20 Feb 2010, Jim Adcock wrote:
"revised" version is going to appear in a few days/weeks/months...
We are not talking about a "revised" version showing up in a couple months. Rather, we are talking about doing a posting which includes HTML 3.5 years later. Verses posting the txt version now rather than later, and thereby increasing the total collection size of PG by 20%. One could argue that this would make the whitewashers job easier rather than harder -- because then Al wouldn't have to put up with random submissions from people like me who give up on DP and "route around damage" [thereby introducing "damage" of our own! :-]
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
-- Greg Weeks http://durendal.org:8080/greg/
participants (13)
-
Al Haines (shaw)
-
Andrew Sly
-
Bowerbird@aol.com
-
David Starner
-
don kretz
-
Greg Weeks
-
James Adcock
-
Jim Adcock
-
Karen Lofstrom
-
Karl Eichwalder
-
Keith J. Schultz
-
Robert Cicconetti
-
traverso@posso.dm.unipi.it