Proposal for augmented structure for PG

I would like to propose an augmented structure for PG. The aim of this structure is to realise Greg's stated desire to include ebooks that are contributed outside of the normal WWer workflow. The basic premise is that the PG library as it stands becomes just one of multiple contributor libraries hosted by PG, with the gutenberg.org interface rejigged to catalogue all of these libraries. In terms of customer interface, the customer gets a searchable catalogue of works. When they access a work, they are presented with a list of which contributor libraries contain expressions of that work, each with some blurb describing the properties of that expression. Thus, for example, the "Huckleberry Finn" page might list two expressions in the WWer library, one in the DP library, one in the Adcock library, an RST derived one in the Marcello library and a ZML derived one in the Bowerbird library. Library owners would be responsible for maintaining their own libraries. Anything in any of the libraries is fair game for other contributors to build on. Contributors may choose to pool resources and maintain common root versions/ master formats, e.g RST advocates might maintain a set of RST files in Mercurial, possibly even grouping together to build a common RST derived library. Customers would be able to list all the books in a particular library, so if they thought the Bowerbird "Huckleberry Finn" was the best formatted ebook they have ever seen they would be encouraged to see what other Bowerbird ebooks there were. If they thought it was awful, they would avoid Bowerbird works in the future. You could even allow customers to blacklist certain libraries from their future results. Library owners would also get the ability to recommend expressions in other libraries to anyone who is visiting their library, creating a useful reputation chain. The WWers should not be upset as nothing is being erased. They have the biggest library, and they are already entirely responsible for its maintenance. It is already maintained to strict standards, and if those standards stack up, it should be the go to library. DP should be pleased. They would get to do CSS3 and make their own epubs if they want to, and they would get to maintain their own library, something that has been talked about wistfully on the DP forums for a _long_ time. Individual contributors should be pleased. They can produce work to specifications driven by their own beliefs in the formats that they believe are superior, and if they are right about the quality of their work being superior, they will garner a following.

On 02/11/2013 12:36 PM, Jon Hurst wrote:
The basic premise is that the PG library as it stands becomes just one of multiple contributor libraries hosted by PG, with the gutenberg.org interface rejigged to catalogue all of these libraries.
Why should PG reorganise itself into a copy of something that already exists elsewhere? See: www.openlibrary.org Why should PG turn itself from a trusted source of books of reliable quality into a repository of every kind of crap somebody wanted to throw in? Who is going to audit the copyright status of submitted works? You'll have to check each book description and every single cover page. You'll also have to check if the book is indeed derived from a PG edition or if there's "value added" in form of copyrighted illustrations, introductions, author biographies etc. Regards -- Marcello Perathoner webmaster@gutenberg.org

On 2013-02-11, Marcello Perathoner wrote:
Why should PG reorganise itself into a copy of something that already exists elsewhere? See: www.openlibrary.org
Because PG could do it with better focus and higher quality.
Why should PG turn itself from a trusted source of books of reliable quality into a repository of every kind of crap somebody wanted to throw in?
"Reliable quality" and "every kind of crap" are highly subjective -- who gets to decide on behalf of our customers which is which? You believe you can do better than the current library using RST; some people will agree your versions are superior, some won't. Those that agree can use your library. Those that don't can avoid it. I am not suggesting an Internet free for all. It would be easy to approve DP and current solo contributors. For the rest, build a library of a few books and send Greg a link; if he likes what he sees he can approve a new library.
Who is going to audit the copyright status of submitted works? You'll have to check each book description and every single cover page. You'll also have to check if the book is indeed derived from a PG edition or if there's "value added" in form of copyrighted illustrations, introductions, author biographies etc.
Same as now -- you need a clearance to include a book in a PG library. You can, of course, also do a book that has already had a clearance. For the "value added" issue, have a take-down mechanism and ban repeat offenders. Regards Jon

On 02/11/2013 02:18 PM, Jon Hurst wrote:
On 2013-02-11, Marcello Perathoner wrote:
Why should PG reorganise itself into a copy of something that already exists elsewhere? See: www.openlibrary.org
Because PG could do it with better focus and higher quality.
I doubt it. I'm not saying impossible, but very improbable. gutenberg.us (also known as the infamous PG II) has been trying to be a meta-catalog for years now, without any success in my eyes. Also, there are hundreds of other meta-catalogs around. People coming to PG want PG, not yet another meta-catalog full of crap.
Why should PG turn itself from a trusted source of books of reliable quality into a repository of every kind of crap somebody wanted to throw in?
"Reliable quality" and "every kind of crap" are highly subjective -- who gets to decide on behalf of our customers which is which? You believe you can do better than the current library using RST; some people will agree your versions are superior, some won't. Those that agree can use your library. Those that don't can avoid it.
I am not suggesting an Internet free for all. It would be easy to approve DP and current solo contributors. For the rest, build a library of a few books and send Greg a link; if he likes what he sees he can approve a new library.
Same question: why do you not submit your better books to openlibrary.org? They already do what you want us to do. You can host them on archive.org.
Who is going to audit the copyright status of submitted works? You'll have to check each book description and every single cover page. You'll also have to check if the book is indeed derived from a PG edition or if there's "value added" in form of copyrighted illustrations, introductions, author biographies etc.
Same as now -- you need a clearance to include a book in a PG library. You can, of course, also do a book that has already had a clearance. For the "value added" issue, have a take-down mechanism and ban repeat offenders.
And who will check that the clearance covers the book? So the WWers will be burdened not only by genuine PG editions but they'll also have to check every `derived“ edition anybody has posted anywhere. Regards -- Marcello Perathoner webmaster@gutenberg.org

On 2013-02-11, Marcello Perathoner wrote:
Same question: why do you not submit your better books to openlibrary.org? They already do what you want us to do. You can host them on archive.org.
OK, I'll do that... I assume you will be doing the same for your RST derived "better books" rather than trying to get them into the PG library? Let me know the URLs because I'm interested in trying them out. Regards Jon

On 2013-02-11, Jon Hurst wrote:
OK, I'll do that... I assume you will be doing the same for your RST derived "better books" rather than trying to get them into the PG library? Let me know the URLs because I'm interested in trying them out.
On reflection, there is, of course, no chance that PG could pull off something like this. I obviously woke up in an over-optimistic frame of mind. And if I'm honest, I have passed the point that I actually care that PG customers get sucky ebooks. Greg, Marcello and the WWers have claimed ownership of that issue. What I do care about is that _I_ am getting sucky ebooks. DP produced books are an order of magnitude better in terms of quality, and I would love to see DP do its own library, but nearly all the books that I actually want to read were done prior to DP getting fully up to speed. I can and will do something about this by helping to produce the buried reworks. Those work fine for me and everyone else on this list, since we know the secret to finding them. Let the customers have crap -- it'll be more bandwidth for us. The missing piece of the puzzle for me is properly formatted versions of these reworks. I can, of course, spend an hour fixing up a book, and if there is no other choice I will do so, but if someone else has already done a book and I like what they have done I would prefer to use that hour for reading. So here is what I propose. I will create a properly formatted epub, mobi and PDF for each rework that I am involved in producing. I will publish these in a small library somewhere cheap. If you email me I will give you a URL for this library. Feel free to rework what I have done; I will not be upset. If you have a library of properly formatted ebooks, please send me a link. If I rework what you have done, please don't be upset. If sufficient people are interested, we'll work out some sort of catalogue. The aim is not to take over the Internet. The aim is to have access to non-sucky ebooks. Regards Jon

On 02/11/2013 08:46 PM, Jon Hurst wrote:
So here is what I propose. I will create a properly formatted epub, mobi and PDF for each rework that I am involved in producing. I will publish these in a small library somewhere cheap. If you email me I will give you a URL for this library. Feel free to rework what I have done; I will not be upset. If you have a library of properly formatted ebooks, please send me a link.
You can easily offer search functionality by implementing an OPDS feed for your library. People with ebook reader apps can `install“ the OPDS url of your library to their app. Their app will then search your library along all other libraries installed. And if your books are crappy they can just remove your url. OPDS, among other things, can act as search aggregator for libraries. Regards -- Marcello Perathoner webmaster@gutenberg.org

So here is what I propose. I will create a properly formatted epub, mobi and PDF for each rework that I am involved in producing. I will publish
these in a small library somewhere cheap. If you email me I will give you a URL for this library. Feel free to rework what I have done; I will not be upset. If you have a library of properly formatted ebooks, please send me a link. If I rework what you have done, please don't be upset. If sufficient people are interested, we'll work out some sort of catalogue. The aim is not to take over the Internet. The aim is to have access to non-sucky ebooks. I love what you are suggesting, because clearly PG has become part of the problem, rather than part of the solution. My only suggestion is to try to figure out even a crude public hosting option so that others can contribute their similar efforts in the same place.

Feel free to spin up a separate effort, of course. However, I am still optimistic about the feasibility of the approach I outlined a year or so ago (?) of having a user-contributed area, with something like a TRAC for variations. Crowdsourcing errata and variations. Making this scalable is challenging. Doing it so that minimal changes are needed to the existing PG structure is key. (And, especially, so I can't be a bottleneck in the process. I hate that.) I'm pretty sure I've written all of this before. Coming up with a *scalable* approach is a bigger challenge than (so far) people have been able to demonstrate. Summing up, I encourage building the better system (or the system that fills gaps in the current system). Trying to change the current system is likely to remain difficult. If the better system is good enough, maybe THAT can become "the" system. (And then, in two or three years, people will be telling whoever designed that new system that their approach is ossified, and to get out of the way of progress, etc. :) -- Greg On Mon, Feb 11, 2013 at 07:46:38PM +0000, Jon Hurst wrote:
On 2013-02-11, Jon Hurst wrote:
OK, I'll do that... I assume you will be doing the same for your RST derived "better books" rather than trying to get them into the PG library? Let me know the URLs because I'm interested in trying them out.
On reflection, there is, of course, no chance that PG could pull off something like this. I obviously woke up in an over-optimistic frame of mind. And if I'm honest, I have passed the point that I actually care that PG customers get sucky ebooks. Greg, Marcello and the WWers have claimed ownership of that issue.
What I do care about is that _I_ am getting sucky ebooks. DP produced books are an order of magnitude better in terms of quality, and I would love to see DP do its own library, but nearly all the books that I actually want to read were done prior to DP getting fully up to speed. I can and will do something about this by helping to produce the buried reworks. Those work fine for me and everyone else on this list, since we know the secret to finding them. Let the customers have crap -- it'll be more bandwidth for us.
The missing piece of the puzzle for me is properly formatted versions of these reworks. I can, of course, spend an hour fixing up a book, and if there is no other choice I will do so, but if someone else has already done a book and I like what they have done I would prefer to use that hour for reading.
So here is what I propose. I will create a properly formatted epub, mobi and PDF for each rework that I am involved in producing. I will publish these in a small library somewhere cheap. If you email me I will give you a URL for this library. Feel free to rework what I have done; I will not be upset. If you have a library of properly formatted ebooks, please send me a link. If I rework what you have done, please don't be upset. If sufficient people are interested, we'll work out some sort of catalogue. The aim is not to take over the Internet. The aim is to have access to non-sucky ebooks.
Regards
Jon
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 2013-02-11, Greg Newby wrote:
Feel free to spin up a separate effort, of course. However, I am still optimistic about the feasibility of the approach I outlined a year or so ago (?) of having a user-contributed area, with something like a TRAC for variations.
Unfortunately, you would need Marcello on board to make any progress. Marcello's thing of RST everywhere to be processed by his software against his stylesheets is pretty much the exact opposite of what you suggested. Therefore no progress on this line can occur, as has been demonstrated over the past year.
Making this scalable is challenging. Doing it so that minimal changes are needed to the existing PG structure is key. (And, especially, so I can't be a bottleneck in the process. I hate that.)
PG as it stands is a scalability disaster zone. The same five people in charge of maintaining 40000 books who were in charge of maintaning 5000 books soon to be in charge of maintaining 50000 books. Really? 100 errata reports a month to process, with only one person in the least bit interested in looking at them, thus an ever increasing backlog that will never see the light of day, thus errata reporters quickly realising that it is not worth the effort. What happens when one of those five can no longer stand it and steps down? Is there anyone who would be fool enough to join their ranks? What happens if you (Greg) get hit by a bus? The only scalable solution is a curated multi library approach, in particular with DP maintaining its own library and DP's PPers taking responsibility for errata in the work they produce. Hell, DP has all but got a functioning errata system ready to go, and the likes of Jeroan have set ups that would make processing errata as simple as could be. But that will happen over Marcello's cold, dead body, and you can't afford to lose Marcello either. But now I'm falling into the old gutvol-d trap of proclaiming the problems without proposing viable solutions. I don't have a solution that stands a chance of being implemented, so I am instead looking towards a solution that may result in me personally having access to some decent ebooks. It may be that that solution ends up being useful to PG in the future, it may not. PG will certainly benefit in the meantime because buried reworks are a damn good place to store data. Regards Jon

On 2/11/2013 5:51 PM, Greg Newby wrote:
Feel free to spin up a separate effort, of course. However, I am still optimistic about the feasibility of the approach I outlined a year or so ago (?) of having a user-contributed area, with something like a TRAC for variations.
I think you have a bit of a misconception here. Trac is primarily a project tracking and management solution. It is used to set up project documentation (presumably design document), a "bulletin-board" system for collaboration, setting milestones and tracking bugs and tasks (tickets). Indeed, the ticketing system seems much more oriented towards tracking assigned tasks than reported defects. While Trac can interface to Subversion, it does so only to the extent of providing a browser-based, read-only interface into an existing Subversion repository. One can browse the "source" (i.e. documents) and generate diffs between various versions, but one cannot add new documents, create new versions of documents, or create new branches of documents; that functionality will require a SVN client. For our purposes, Trac is functionally equivalent to Jira + ViewSVN. I have no opinion as to which of these two approaches is superior; I suspect that it doesn't matter, and that the preferences of the system administrator should be the deciding factor. Despite multiple requests on their web site, Trac does not have an interface into CVS; instead it recommends using Trac for ticket management and ViewCVS for the web interface into the CVS repository. All that being said...
Crowdsourcing errata and variations.
Making this scalable is challenging. Doing it so that minimal changes are needed to the existing PG structure is key. (And, especially, so I can't be a bottleneck in the process. I hate that.)
I'm pretty sure I've written all of this before. Coming up with a *scalable* approach is a bigger challenge than (so far) people have been able to demonstrate.
Mr. Newby has imported all of the PG repository into a Subversion database on readingroo.ms as of February 28, 2012. The repository can be browsed from inside Trac at http://trac.readingroo.ms/gutenberg/browser. Note that this repository does not include any changes to the PG corpus made since that time, although some enterprising soul could do that by reviewing the commits archived at http://news.gmane.org/gmane.culture.literature.e-books.gutenberg.announce.po.... I have enabled public access to the ticketing system on http://trac.readingroo.ms/gutenberg/. Any person can enter a new defect report by selecting "New Ticket" on that page. Equally, any person can get a list of all tickets, including resolved tickets, by selecting "View Tickets" from that same page. I would encourage everyone to report defects into the Trac ticketing system. They won't get as much visibility there, but at least they will be archived in a publicly accessible database. To make this really useful, we need to set up Trac so that every defect report generates an e-mail message to errata at pglaf dot org. Trac has that capability, but I haven't figured out how to enable it yet. It may be necessary to install an MTA on readingroo.ms if one is not already present. I'm just one man, and I have a real job; this is a lower priority for me. I have modified the Subversion repository on readingroo.ms so that it can be used by an SVN client in HTTP mode. SVN is exposed on the web by WebDAV, so I had to install the Apache WebDAV module. The way WebDAV, SVN, and Trac work together (i.e., just barely) is heavily dependent on Unix file system permissions (User, Group, World), and I'm not sure those things are set up correctly yet. I want the Subversion repository to be world readable, but require authentication to check in new versions and branches. For all I know, the way I've set things up has made readingroo.ms subject to attack; I'm still muddling through. If you would like SVN write-access to the Subversion repository on readingroo.ms, e-mail me off-list and I'll see what I can do. No promises, and I may not get back to you for weeks. I have absolutely no interest in the impoverished text files in the PG Subversion archive; I'll be happy to provide read/write access to qualified users, but I'm not going to get involved beyond that. Well formatted HTML files are another matter. Clearly, some sort of versioning and branching will be needed for those files. But we have to be aware of the potential for version wars that creep in to places like Wikipedia. I'm certain that Mr. Adcock, Mr. Kretz and I are going to have some pretty serious disagreements about the proper form for a particular file. We can't tolerate on person making a change, then a second person backing it out, then the first person making the change again, then a third person making a contradictory change, then the first person backing out the contradictory change, and so on. I don't think any person's changes ought to be rejected, but we will need a process to determine which changes become part of HEAD, and which changes will be committed to a specific branch (it will be the responsibility of the "branch owner" to keep the branch synchronized with the main line when defects are resolved). So before a versioning mechanism can be established for HTML files some sort of standards will need to be established. Only changes consistent with those standards will be committed to the main line; inconsistent changes will be forked. People who consistently make gratuitous or non-standard changes to files will simply have their write access revoked. Of course, there's no reason why we couldn't create separate Subversion sandboxes for any individual who wanted one and didn't want to play by the rules. Mr. Kretz and I have had several productive discussions, both on- and off-list about what rules there should be for creating HTML files. As a result, I have begun a draft document which can be found at http://readingroo.ms:8080/PG2ePub/cvsget/HTMLRules.html. This document should be viewed as nothing more than a first draft; it is not well organized, incomplete, and subject to change. I am by profession a software developer, not a Linux system administrator, and what I have done with Trac, Subversion and Apache on readingroo.ms was accomplished primarily by googling for answers to specific questions, then trying out the answer to see what happens. If there is anyone who would like to take over administration of any of these three services, please contact me off-list.

Why should PG turn itself from a trusted source of books of reliable quality into a repository of every kind of crap somebody wanted to throw in?
PG is NOT a trusted source of books of reliable quality but the "PG High Priesthood" believes themselves to be a trusted source of books of reliable quality, and therein lies the problem. Most PG books being read by most people fall into the "Barely Readable" "Crap" category -- your words not mine. Most could be easily fixed. But they are not being fixed. PG even refuses to count the "Crap" so therefore how could it be "fixed" -- if PG cannot or will not even recognize "Crap" when they are producing it?
participants (5)
-
Greg Newby
-
James Adcock
-
Jon Hurst
-
Lee Passey
-
Marcello Perathoner