
On Sun, November 18, 2012 6:12 pm, Greg Newby wrote:
On Sun, Nov 18, 2012 at 02:30:55PM -0500, Bowerbird@aol.com wrote:
[snip]
moreover, i have created a clean version of the same edition that jim used. so imagine that i am going to submit it to p.g., along with the pagescans, to create a better overall package...
this is the question for jim.
what should p.g. do with my submission?
1. reject it.
2. put it under a new number.
3. accept it, and overwrite jim's edition with mine, complete with a new credit-line mentioning "a volunteer", but no name.
[snip]
This sounds like "2" to me.
Well, it needs to have a new identity, although not necessarily a new number. What you are suggesting, Mr. Newby, is to permit a number of "snowflakes"--a suggestion which has some value. But I think that just about everyone understands inately the "Work/Expression" notion--there might be several "expressions" of _The Adventures of Huckleberry Finn_, but they are all expressions of the broader work. When a list derived from the search of "huck finn" is presented, it is not clear that the listed items are all different expressions of a single work, or whether multiple works might be included. I think it is time to consider renaming all the files using a "Work/Expression" naming scheme: instead of "76.txt" the file would be named something like "Twain,Mark-TheAdventuresOfHuckleberryFinn-Anon.txt". The search results should be ordered by last modification date, although number of downloads is interesting data that could be included. Available formats should be listed, as well as submitters comments. With this data, a downloader should have enough information to make an intelligent choice as to which to download--and it should be apparent that all the files (or folders) are variations on a single work. (Do people realize that texts 7100 to 7107 are also Huck Finn, apparently the same as the text version of 76, but broken up into 8 parts? Are errata fixes getting made to those files as well? eText numbers bear little relation to the actual number of books available, so renaming files to make them more transparent will have litte to no impact on the system that exists now.)
As has been said before, the "real" problem is that the best prepared eBooks are often not found first. Instead, ordering is based on popularity, which tends towards self-reinforcement.
A simple fix would be to make the default ordering by release date on the assumption (probably accurate) that later versions are better versions.
It would be nice for the search engine at www.gutenberg.org to rank by quality, not just popularity, when similar titles are found. All we need is a good way of assessing quality.
Just ask the customer. Next to every search result put a little link to "Rate This." The link would pop up a window with a 1-5 set of radio buttons with the simple request, "Please rate the quality of this file set." Add a text field for comments. I don't think you even need to explain the criteria by which to judge quality; after enough ratings, the correct answer will emerge. See: http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds