
Greg, There is a tool created for FLOSS Manuals called Booki. It enables collaboration over the web to create books, both printed books like you can do with Lulu.com and EPUBs. One feature it has is the ability to import EPUBs, and there is an interface that lets you import EPUBs created by archive.org. In fact, archive.org is a sponsor of Booki. They see it as a way to get the EPUBs they now generate using OCR proofed and corrected. This tool is not perfect, but it has already been used to create manuals for a lot of Free Software projects, including two manuals for the One Laptop Per Child project that I wrote, plus a translation into Spanish of the first of my manuals that was done by volunteers in South America. You can check it out here: http://en.flossmanuals.net/ James Simmons On Fri, Jan 27, 2012 at 1:58 AM, Greg Newby <gbnewby@pglaf.org> wrote:
Whew. I counted 100 messages in 2 days. Thanks for the lively discussion. I changed the subject for the theme that Joshua and others mentioned (below). The idea I am very interested in fostering is capable online tools to let essentially anyone make edits, add formats, or prepare derivative files, from any PG eBook. Then, to easily add those changes back into (an area of) the PG collection.
My view is that we could very easily have an additional major category of file, for a given work. Currently, we have two major categories: first are files that go through WWers to get online, and second are those that are automatically generated from the first type. (While this is a gross simplification, in fact at www.gutenberg.org it's really easy to tell which is which -- they are in a different set of subdirectories, with a different file naming scheme.)
A third (new) type would be those files that are, in some way, modified, derived, or produced by other people and their tools. Not necessarily WWers or the original producers/submitters. In a word, crowdsourcing. Or community editing. Or version control. Or whatever you want to call it: the point would be that ANYONE with desire and some basic capability could make changes to existing files, or provide derivative files.
I can think of several major details, and many minor ones. Concerns about copyright, spamming, whether anonymous edits are permitted, a review/revision/recision cycle, character sets, forking, searching, etc., etc. I would love to see multiple "master" files, created lovingly by hand in any or all of RST, LaTeX, or yfm (Your Favorite Markup) -- then allow users to select which master to use to generate their, say, EPUB. And, which tools to use for the conversion. Of course, people who wanted to lovingly craft an EPUB would be able to upload that, too.
A capable crowdsourcing tool - preferably one that already exists, is well-maintained, is free, and will require relatively few modifications - is the starting point I'd most like to see. Whether we start with one book or 100 or 38000 doesn't matter to me, though it matters a whole lot that the solution is scalable to the full collection.
As for the questions about whether this would be allowed, or would pollute the essence of whatever, or piss off whomever: no, PG does't work that way, and never did. The answer is, and has been, "yes, go for it. It's all good, and on-mission." I am sensitive to not removing or undoing others' work, but my view is that current files of the first type, above, would remain, and be easy to find, and that for the main collection at www.gutenberg.org, the WWer process (perhaps as modified, thanks to the new tools that have been under discussion) would still apply.
Last year, I tried to deploy TRAC for group editing and version control of PG eBooks. It couldn't handle the directory count, and never finished, though I'm ready to try again. Or, a different tool.
As many subscribers have heard before, I have some hefty servers that can be used for experimentation and proof of concept. That's not the hard part.
If people are aware of good tools we could base this on, please speak up. I can elaborate on why I prefer to start with an existing tool, but in a nutshell it is because (as many have pointed out), the fundamentals of crowdsourcing and file revisions are already covered by a bunch of excellent tools. Let's not reinvent the parts that others are doing well since, after all, there are plenty of challenges that are unique to Project Gutenberg or to eBooks in general.
-- Greg
On Wed, Jan 25, 2012 at 01:01:08AM +0100, Marcello Perathoner wrote:
On 01/24/2012 11:08 PM, Joshua Hutchinson wrote:
So, if someone were to start "refactoring" old PG texts into TEI or RST and working with a WWer to repost them ... is this a workable idea?
More than a technical challenge it would be a political one. I can convert a novel the size of Pride and Prejudice into RST in about an hour. More if there is formatting or images to recover. But I'd prefer to avoid the riot that will ensue if we start to reformat DP texts.
We could start redoing the top 100 list excluding everything that is too hard and everything made by DP.
Maybe we start this process on a semi-private mirror of the PG corpus and only when it reaches a critical mass of some sort it gets moved over. But an official notice that this project has some backing is necessary or we'll just keep seeing everything running around in ten different directions and nothing ever getting done.
A semi-official branch would be a good occasion to ditch the old WWer workflow in favor of a source repository (git or mercurial) that holds all the masters.
Should we reserve a range of ebook nos. or shadow the existing ones?
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d