
Greg, The TRAC database is interesting. I will have to look a little into trac to see what it can do. A quick look at the trac project page learns that it contains a wrapper around Subversion (or alternative version control systems). Basically it will take some time working with a system to evaluate its usefulness. It seems you've never done anything with it after the initial import. One of the first steps I would take is weed out all files that can be dynamically regenerated, that is, all the zip archives that are also present as uncompressed file-sets. A zip is easily created on the fly. (I have maintained a local copy of PG for years, working the other way round, only grabbing the zips; since I've ran out of disk space, I am no longer doing this.) I still have about 80 (hard) books to Post-Process for PGDP, after that I will dive more in this type of things. One barrier to "can-do" attitude requested that Project Gutenberg can do is to remove the "copyright" restrictions on the collection as a whole, and revise the somewhat arcane PG license restrictions. I think it would be most appropriate to stamp something like CC-BY or CC-BY-SA on the entire collection (exempting only the copyrighted contributions), and drop the claim that the collection itself warrants some kind of "compilation copyright" (something I thing makes no sense and will no hold, as there has been no creative effort in making the compilation, just accepting everything that meets a certain threshold of technical quality and copyright clearance does not count here). Of course, having the RDF catalog under GNU GPL is great. Jeroen. On 2012-09-18 18:41, Greg Newby wrote:
Jeroen,
I looked into TRAC, and actually got it to ingest the whole collection (it took a few days).
http://trac.readingroo.ms/gutenberg/
It does revision control, issue tracking, etc. Unfortunately I have not had time (and don't have sufficient expertise) to take it much further than that. If anyone is interested, I'd be happy to provide access.
Ultimately, part of my goal (which I expressed here earlier in the year) is exactly what you wrote about: better ability to crowdsource production and errata handling, and to more easily allow variations & derivative works. I wrote a fair amount about it then, so won't go into detail in this thread.
-- Greg