they won't allow you to use facts to embarrass them

they won't allow you to use facts to embarrass them... that seems to be jim's objection to the whitewashers. and they won't let you step in and replace their work... *** which is why david widger posted jim's version of "huck" at a separate number from david's own version of "huck". otherwise, david woulda voluntarily replaced his own work. which, quite realistically, we cannot expect anyone to do... but it wouldn't have been that different even if it had been some other volunteer's work david was reluctant to replace. as greg has pointed out, it's a somewhat delicate issue to answer the question about the removal of _anyone's_ work. and i certainly don't have the answer. (well, i could probably think up a number of ways to avoid the situation where jim's superior "huck" gets downloaded much less frequently than david's inferior version, but that just sidesteps the more thorny question of replacing work.) however, i pretty much accept that the whitewashers have the power to do whatever they want, so it wouldn't really make any difference even if i _did_ have the answer, since the whitewashers will just keep on doing what _they_ want. so i accept david's action. indeed, since jim has admitted that his version is _different_ than david's, i don't see how jim's version could replace david's version in any situation. it would be different if jim's version were the same as #76. it seems to me, though, that different versions _should_ be posted as different numbers. they're not exactly the same. but since jim has whined about this matter for so very long, i think it might be interesting to give him a thought problem about just exactly what he would recommend on this issue... so let's give him a little incentive... let's imagine... say i've corrected the errors in jim's version. i have generated a list of his errors, and created an interface where you can check that all those errors are indeed wrong. moreover, i have created a clean version of the same edition that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package... this is the question for jim. what should p.g. do with my submission? 1. reject it. 2. put it under a new number. 3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name. (i would not take a joint credit-line, because that might imply we'd worked together; and likely neither would jim, probably. it would be acceptable to me if the credit-line got _deleted_.) jim's flawed version could be stored in the "old" subdirectory, because hey, nobody ever looks inside those anyway, do they? anyone else is invited to chip in their opinion as well, but i am especially interested in jim's thoughts. and, if he makes a post before thanksgiving, i will consider it in my deliberations about whether or not, after thanksgiving weekend, i will indeed make a submission to p.g. of a corrected edition of jim's "huck finn". -bowerbird

This sounds like "2" to me. As has been said before, the "real" problem is that the best prepared eBooks are often not found first. Instead, orderingis based on popularity, which tends towards self-reinforcement. It would be nice for the search engine at www.gutenberg.org to rank by quality, not just popularity, when similar titles are found. All we need is a good way of assessing quality. -- Greg On Sun, Nov 18, 2012 at 02:30:55PM -0500, Bowerbird@aol.com wrote:
they won't allow you to use facts to embarrass them...
that seems to be jim's objection to the whitewashers.
and they won't let you step in and replace their work...
***
which is why david widger posted jim's version of "huck" at a separate number from david's own version of "huck".
otherwise, david woulda voluntarily replaced his own work. which, quite realistically, we cannot expect anyone to do...
but it wouldn't have been that different even if it had been some other volunteer's work david was reluctant to replace.
as greg has pointed out, it's a somewhat delicate issue to answer the question about the removal of _anyone's_ work.
and i certainly don't have the answer.
(well, i could probably think up a number of ways to avoid the situation where jim's superior "huck" gets downloaded much less frequently than david's inferior version, but that just sidesteps the more thorny question of replacing work.)
however, i pretty much accept that the whitewashers have the power to do whatever they want, so it wouldn't really make any difference even if i _did_ have the answer, since the whitewashers will just keep on doing what _they_ want.
so i accept david's action. indeed, since jim has admitted that his version is _different_ than david's, i don't see how jim's version could replace david's version in any situation. it would be different if jim's version were the same as #76.
it seems to me, though, that different versions _should_ be posted as different numbers. they're not exactly the same.
but since jim has whined about this matter for so very long, i think it might be interesting to give him a thought problem about just exactly what he would recommend on this issue...
so let's give him a little incentive...
let's imagine... say i've corrected the errors in jim's version. i have generated a list of his errors, and created an interface where you can check that all those errors are indeed wrong.
moreover, i have created a clean version of the same edition that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package...
this is the question for jim.
what should p.g. do with my submission?
1. reject it.
2. put it under a new number.
3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name. (i would not take a joint credit-line, because that might imply we'd worked together; and likely neither would jim, probably. it would be acceptable to me if the credit-line got _deleted_.) jim's flawed version could be stored in the "old" subdirectory, because hey, nobody ever looks inside those anyway, do they?
anyone else is invited to chip in their opinion as well, but i am especially interested in jim's thoughts. and, if he makes a post before thanksgiving, i will consider it in my deliberations about whether or not, after thanksgiving weekend, i will indeed make a submission to p.g. of a corrected edition of jim's "huck finn".
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Hi Greg, I would tend to agree with you that quality should be the driving force. Yet, how do you rank quality: 1) amount of errors. No brain as such 2) quality of formatting 3) quality for creating different output formats 4) quality of the output formats 5) quality of the source Quality assessment is a big can of worms. Furthermore, who does the quality assessment? regards Keith. Am 19.11.2012 um 02:12 schrieb Greg Newby <gbnewby@pglaf.org>:
This sounds like "2" to me.
As has been said before, the "real" problem is that the best prepared eBooks are often not found first. Instead, orderingis based on popularity, which tends towards self-reinforcement. It would be nice for the search engine at www.gutenberg.org to rank by quality, not just popularity, when similar titles are found. All we need is a good way of assessing quality.
-- Greg

Quality assessment is a big can of worms. Furthermore, who does the quality assessment?
PG does have standards of quality. It's just that the golden-moldies don't meet them -- if these were submitted today they would be rejected. If an old submission doesn't meet current standards of quality, then go fix them, or solicit a volunteer or volunteers to go fix them. Or if a current submission gets accepted [by mistake], but it doesn't meet current standards, then again, go fix it, or solicit volunteers to fix it. Michael solicited volunteers on a number of good books, and encouraged me to do them, even though I was a newbie. Why? Because I was willing to do the work. And why was I willing to do the work? Because Michael had good taste in books. Putting a ton of time and effort into a tiresome book becomes a pain in the behind, in my experience.

On Sun, November 18, 2012 6:12 pm, Greg Newby wrote:
On Sun, Nov 18, 2012 at 02:30:55PM -0500, Bowerbird@aol.com wrote:
[snip]
moreover, i have created a clean version of the same edition that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package...
this is the question for jim.
what should p.g. do with my submission?
1. reject it.
2. put it under a new number.
3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name.
[snip]
This sounds like "2" to me.
Well, it needs to have a new identity, although not necessarily a new number. What you are suggesting, Mr. Newby, is to permit a number of "snowflakes"--a suggestion which has some value. But I think that just about everyone understands inately the "Work/Expression" notion--there might be several "expressions" of _The Adventures of Huckleberry Finn_, but they are all expressions of the broader work. When a list derived from the search of "huck finn" is presented, it is not clear that the listed items are all different expressions of a single work, or whether multiple works might be included. I think it is time to consider renaming all the files using a "Work/Expression" naming scheme: instead of "76.txt" the file would be named something like "Twain,Mark-TheAdventuresOfHuckleberryFinn-Anon.txt". The search results should be ordered by last modification date, although number of downloads is interesting data that could be included. Available formats should be listed, as well as submitters comments. With this data, a downloader should have enough information to make an intelligent choice as to which to download--and it should be apparent that all the files (or folders) are variations on a single work. (Do people realize that texts 7100 to 7107 are also Huck Finn, apparently the same as the text version of 76, but broken up into 8 parts? Are errata fixes getting made to those files as well? eText numbers bear little relation to the actual number of books available, so renaming files to make them more transparent will have litte to no impact on the system that exists now.)
As has been said before, the "real" problem is that the best prepared eBooks are often not found first. Instead, ordering is based on popularity, which tends towards self-reinforcement.
A simple fix would be to make the default ordering by release date on the assumption (probably accurate) that later versions are better versions.
It would be nice for the search engine at www.gutenberg.org to rank by quality, not just popularity, when similar titles are found. All we need is a good way of assessing quality.
Just ask the customer. Next to every search result put a little link to "Rate This." The link would pop up a window with a 1-5 set of radio buttons with the simple request, "Please rate the quality of this file set." Add a text field for comments. I don't think you even need to explain the criteria by which to judge quality; after enough ratings, the correct answer will emerge. See: http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

Just ask the customer. Next to every search result put a little link to "Rate This." The link would pop up a window with a 1-5 set of radio buttons with the simple request, "Please rate the quality of this file set." Add a text field for comments.
Amazon does this. Go to Books/Advanced Search. Title=Huckleberry Finn; Author=Mark Twain; Format=Kindle;Sort Result By=Avg. Customer Review Q: Does this work in practice? A: Nope. Q: Well maybe PG's customers are more intelligent? A: You tell me. The Huck Finn debacle at both PG and Amazon would seem to point out the problem is not self-correcting. IMHO the best solution to keep customers from downloading crap versions is not to offer them -- or to fix them.

On 11/24/2012 11:12 PM, James Adcock wrote:
IMHO the best solution to keep customers from downloading crap versions is not to offer them -- or to fix them.
Ahh, but that kind of begs the question doesn't it? We don't want to keep people from downloading crap versions, we just want to be sure they /know/ they're getting crap versions. And how are we going to know that any particular version /is/ a crap version? What we do know, is that we can't trust the opinion of Marcello Perathoner, James Adcock, David Widger, Bruce Morasch, Greg Newby or even Lee Passey. We /can/ probably trust the quality of HTML books produced of late by Distributed Proofreaders, but it would appear that no one is interested in downloading the recent work product of DP. So, you have failed to address Mr. Newby's core question, which is "how do we determine the quality of any arbitrary PG edition?" I'm sure that any concrete proposal you would care to make would be genuinely appreciated.

On Sun, Nov 25, 2012 at 10:39 AM, Lee Passey <lee@passkeysoft.com> wrote:
So, you have failed to address Mr. Newby's core question, which is "how do we determine the quality of any arbitrary PG edition?" I'm sure that any concrete proposal you would care to make would be genuinely appreciated.
User vote? Set up a page for each work with multiple versions. The page should link to all editions; this is the page that should appear when someone searches for the work. Perhaps add info re date of submission and whether the version has been done by one volunteer or by DP. Ask PG users who have downloaded one or another of the versions, or read enough online to judge, to vote on accuracy and user-friendliness. Over time, the better versions should accumulate more votes. -- Karen Lofstrom

That is an idea I love: just make a "related books" tab on each page, and put the various revisions/versions/editions or whatever you name them on the top of the list, then translations and derived works, etc. Shouldn't be too hard to add to the site. Jeroen. On 2012-11-25 22:21, Karen Lofstrom wrote:
On Sun, Nov 25, 2012 at 10:39 AM, Lee Passey <lee@passkeysoft.com> wrote:
So, you have failed to address Mr. Newby's core question, which is "how do we determine the quality of any arbitrary PG edition?" I'm sure that any concrete proposal you would care to make would be genuinely appreciated.
User vote? Set up a page for each work with multiple versions. The page should link to all editions; this is the page that should appear when someone searches for the work. Perhaps add info re date of submission and whether the version has been done by one volunteer or by DP. Ask PG users who have downloaded one or another of the versions, or read enough online to judge, to vote on accuracy and user-friendliness. Over time, the better versions should accumulate more votes.
-- Karen Lofstrom _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Sun, Nov 25, 2012 at 01:39:29PM -0700, Lee Passey wrote:
On 11/24/2012 11:12 PM, James Adcock wrote:
IMHO the best solution to keep customers from downloading crap versions is not to offer them -- or to fix them.
Ahh, but that kind of begs the question doesn't it? We don't want to keep people from downloading crap versions, we just want to be sure they /know/ they're getting crap versions. And how are we going to know that any particular version /is/ a crap version?
What we do know, is that we can't trust the opinion of Marcello Perathoner, James Adcock, David Widger, Bruce Morasch, Greg Newby or even Lee Passey. We /can/ probably trust the quality of HTML books produced of late by Distributed Proofreaders, but it would appear that no one is interested in downloading the recent work product of DP.
:)
So, you have failed to address Mr. Newby's core question, which is "how do we determine the quality of any arbitrary PG edition?" I'm sure that any concrete proposal you would care to make would be genuinely appreciated.
Karen's suggestion isn't that far from what we can do now, and actually have done in practice: a note in the bibrec. The bibrec for #11 says to also look at #928. These are added by hand... any of the team at gutcat@lists.pglaf.org can add them (usually Andrew Sly does these, but Marcello and I also tweak the records). Currently, the bibrec is not the main "tab" (see www.gutenberg.org/etext/11 if you're not sure what I'm talking about). But I'm sure the Note could appear on the Download tab instead. A simple technique to raise awareness of alternate versions would be to add them to the bibrec as a Note. There might be some situations when there is ambiguity in whether a note really refers to an alternate edition of the same book. This applies with Alice, for example: 11, 114, 928, 19033, 19573, 23716, 28885. And, The Adventures of Huckleberry Finn, by Mark Twain 32325 Audio: Adventures of Huckleberry Finn, by Mark Twain 19640 Audio: Huckleberry Finn, by Mark Twain 9007C Adventures of Huckleberry Finn, Part 8, by Mark Twain (Samuel Clemens) 7107 Adventures of Huckleberry Finn, Part 7, by Mark Twain (Samuel Clemens) 7106 Adventures of Huckleberry Finn, Part 6, by Mark Twain (Samuel Clemens) 7105 Adventures of Huckleberry Finn, Part 5, by Mark Twain (Samuel Clemens) 7104 Adventures of Huckleberry Finn, Part 4, by Mark Twain (Samuel Clemens) 7103 Adventures of Huckleberry Finn, Part 3, by Mark Twain (Samuel Clemens) 7102 Adventures of Huckleberry Finn, Part 2, by Mark Twain (Samuel Clemens) 7101 Adventures of Huckleberry Finn, Part 1, by Mark Twain (Samuel Clemens) 7100 Adventures of Huckleberry Finn, by Mark Twain 76 So, as usual, things are not completely simple. Editing the bibrec is easy enough to do. I encourage people with an interest to simply email "like" records, and we can edit them in. The Note field is flexible, and can be duplicated within a bibrec. -- Greg

On Sun, November 25, 2012 4:52 pm, Greg Newby wrote:
Karen's suggestion isn't that far from what we can do now, and actually have done in practice: a note in the bibrec.
Credit where credit is due, this was actually Mr. Hellingman's suggestion. Ms. Lofstrom's suggestion was simply a reiteration of my suggestion of allowing "the crowd" to evaluate the quality of any given version.
The bibrec for #11 says to also look at #928. These are added by hand... any of the team at gutcat@lists.pglaf.org can add them (usually Andrew Sly does these, but Marcello and I also tweak the records).
I suspect that if we knew the schema of the bibliographic database someone could probably write a script in about a half an hour to programmatically link all corresponding works. Unfortunately, it seems that the schema is a closely guarded secret, so this tedious manual process is the only, inadequate alternative.
Currently, the bibrec is not the main "tab" (see www.gutenberg.org/etext/11 if you're not sure what I'm talking about). But I'm sure the Note could appear on the Download tab instead.
A simple technique to raise awareness of alternate versions would be to add them to the bibrec as a Note. There might be some situations when there is ambiguity in whether a note really refers to an alternate edition of the same book. This applies with Alice, for example: 11, 114, 928, 19033, 19573, 23716, 28885.
And yet, the notes for bibrec #11 still only references #928 and none of the other texts you have just identified. And neither #928 nor #19033 (I didn't look at the other records) have a reverse reference back to #11 or to each other. It appears that the manual system is so error-prone as to be simply not worth the effort. And good luck getting Mr. Perathoner to change the default tab for texts. It is time to retire public references to e-text numbers. A search for _Alice's Adventures in Wonderland_ should return a single page. That page should link to downloadable files differentiated by modification date, file format and contributor's notes. A customer should get the file s/he wants without having to select 11, 928 or 19033; to the customer, this is just noise. [snip]
Editing the bibrec is easy enough to do. I encourage people with an interest to simply email "like" records, and we can edit them in. The Note field is flexible, and can be duplicated within a bibrec.
This is clearly a hack, attempting to force the system to support a feature that it was not designed to support. If I know enough to follow the notes field on the bibrec tab (when it is actually correctly populated) I know enough to find the best version using other search tools. Squirreling data away in a "notes" field is hardly a user-friendly way to help higher-quality texts to become more visible.

So, you have failed to address Mr. Newby's core question, which is "how do we determine the quality of any arbitrary PG edition?" I'm sure that any concrete proposal you would care to make would be genuinely appreciated.
You could allow critics of a particular work to identify real errors in PG works, and if the count exceeds some threshold number, then the work ought to be labeled "crappo." A "real error" would be some location where the work doesn't match a current coding requirement of PG, doesn't match the actual referenced work, doesn't match current and/or historical typographical standards etc. Of course some of the early PG works don't appear to reference any actual work -- which is a real problem. And of course, the high priests and priestesses will claim that which they do isn't a real error, but that which others do *is* a real error, which again is part of the problem. Working with "real" programmers, when you show some a bug they say "oh crap" and run off and fix it. But others you show them a bug and they will deny deny deny that it even is a bug. Which in turn begs the question of how the whitewashers determine that which is permissible or not... ...and the answer depends on the day of the week and the phase of the moon. Again, this takes you back to the original question: How does PG/DP actually identify errors and fix them after a work has been posted? Answer: They don't.

Hi All, Well, back in the Plain Vanilla days, if I remember correctly: 1 No. 2 Yes. 3 No Because the text was done differently. Furthermore, correction to a particular version was made to that version and a new revision was released. Today, the situation is somewhat more complex as the amount of information – formatting is greater. So 2, would be the way to go, also. Yet, if one just corrected the original submission( no idea how that could be done) then just a revision bump should be done. As to the credit line, just use corrections submitted by e.g BB. This would not imply a direct interaction between the original submitter and the corrector. regards Keith. Am 18.11.2012 um 20:30 schrieb Bowerbird@aol.com:
moreover, i have created a clean version of the same edition that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package...
this is the question for jim.
what should p.g. do with my submission?
1. reject it.
2. put it under a new number.
3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name. (i would not take a joint credit-line, because that might imply we'd worked together; and likely neither would jim, probably. it would be acceptable to me if the credit-line got _deleted_.) jim's flawed version could be stored in the "old" subdirectory, because hey, nobody ever looks inside those anyway, do they?
anyone else is invited to chip in their opinion as well, but i am especially interested in jim's thoughts. and, if he makes a post before thanksgiving, i will consider it in my deliberations about whether or not, after thanksgiving weekend, i will indeed make a submission to p.g. of a corrected edition of jim's "huck finn".

so i accept david's action. indeed, since jim has admitted that his version is _different_ than david's, i don't see how jim's version could replace david's version in any situation. it would be different if jim's version were the same as #76.
moreover, i have created a clean version of the same edition
Bizarre in BB's world I "admitted" my version was different - on the contrary I stated from day one my version was different, that it was intended to be different to avoid the problem that PG has with not allowing a submission to "fix" a golden-moldie version, and from day one I said I was doing a different version because I wanted to demonstrate my cross-versioning tool, and also that I actually *like* a version of Huck that has fewer of the overtly racist "Uncle Tom" cartoons in it. IMHO Mark Twain (like Stephen Crane "Monster") in the writing is being mostly ambiguous on the issue of racism, whereas the Uncle Tom cartoons are overt. that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package... When I submit a version to PG I personally am saying not "This is Perfect" but rather "I am done with it, now it's yours." In my opinion the issue of whether or not page images are included are not is moot - IF I give PG a pointer to the images when I do the copyright clearance. In which case if they want the page images but are not willing to go get them, well, then that is sad. I have pointed out on this forum many times why a person submitting a book may not also want to send a physical copy of the page images to PG. If BB submits a substantially "better" version of the identical edition - not just a "snowflake" version which says comes with a brown "ye olde aged paper" background then I would hope that PG would accept the new version and use it to replace the old "Jim" version. I would hope that BB would build on my efforts when doing so as to not totally waste his time and effort - which BB has done so. "Better" would have to be a version that runs substantially better on most reader machines PG "customers" actually read on, and not just on BB's iPad, for example, and be of similar or smaller size. IF BB's submission was not substantially overall better I would hope PG would say "Just send us the errata list please." I would also hope for example that PG would have a consistent policy on this issue, and not say "Oh we don't replace DP versions because that hurts their feelings." Or "we don't replace the High-priestess versions because that hurts *their* feelings." A better solution, IMHO, would be if PG were to allow golden-moldie versions to be reworked and the 100s of out-right errors in them to be fixed - while still respecting the transcription choices in the original as much as possible. This may not always be possible. In fact in some cases PG ought to be actively soliciting people to rework some of the golden-moldies. And a lot of the golden-moldies are really stinko editions in the first place, in which place please dear god find a better edition with better provenance, replace the stinko version in its entirely silently and retain the old number, and burn the old bits on a pyre. One had better assume that better versions are coming - god help us if nothing better than HTML is ever invented for coding books. And if a better encoding IS invented then I hope dear please to god that ALL the old HTML eventually gets thrown away. Not to imply by any means that I think either TEI or BB's tilde language represent an improvement in practice over HTML. 3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name. When I put my name in a credit-line it is to say "I accept responsibility for this work as it is, good or bad." It is not a claim that my efforts are perfect. BB in turn enjoys tracking down PG books which have my name on it and pillorying me for my efforts, simply because I have the affront to say that I think many of BB's ideas are bad ideas and a waste of time - such as the idea that everyone is going to follow BB's suggestion of posting page images next to transcriptions. I think page images are good idea, mind you, just not taken to BB's extremes. But credit-lines is part of being willing to put my name on my efforts. It also means that if someone wants to build on my efforts, replace them, fix them, question them, etc., they need only contact me. anyone else is invited to chip in their opinion as well, but i am especially interested in jim's thoughts. and, if he makes a post before thanksgiving, i will consider it in my deliberations about whether or not, after thanksgiving weekend, i will indeed make a submission to p.g. of a corrected edition of jim's "huck finn". Sorry, BB, but I have friends and relatives whose company to enjoy on Thanksgiving. If you do submit your "corrected" version to PG, I would be curious to see or hear their reaction - which might be influenced more by which of the two of us is - at that particular moment - most 'persona non grata.' ;-)
participants (7)
-
Bowerbird@aol.com
-
Greg Newby
-
James Adcock
-
Jeroen Hellingman
-
Karen Lofstrom
-
Keith J. Schultz
-
Lee Passey