
(New subject+thread) On Mon, Jan 30, 2012 at 08:54:18PM -0800, James Adcock wrote:
... Again, the direction I hear you guys heading in is that you-all want to reinvent DP because you-all think you can improve-upon DP. Not saying you can, not saying you can't. I'm just say that is not *what I heard* Greg talking about.
From that standpoint, I've left many other (important!) things out of scope: proofreading (distributed or otherwise), tools for auto-conversion (we have some; others will plug-in but will be developed separately), and even arguments about master format (we have some, and better exposure of conversion tools and options will create a positive feedback loop to guide contributors to do a better job. And/or, open capabilities for OTHERS to fix the masters). Cataloging,
Thanks. This is just what I was thinking of pointing out (catching up on today's emails...). Below are details on what I was envisioning. The need I'm trying to address is reformatting or editing eBooks, not proofreading them. For starts, let's consider books that get into the PG collection in the usual way (i.e., culminating with a WWer posting them). I think we can do what's below while leaving the existing workflow out of scope. (It's not holy or untouchable or anything, just out of scope for what follows.) What I'd like is (as someone else nicely put it) a continual improvement opportunity, provided to essentially anyone, for eBooks in teh PG collection. This boils down to a handful of critical activities. It's mainly the third one (III) that involves crowdsourcing and new tools. I. making changes to the master file(s) [let's imagine that we retain the practice of every PG eBook having a small number of master files, in a small number of master formats]. The short list of master formats includes RST, HTML, TeX/TEI, and plain text (perhaps with light markup). Maybe this list will grow in the future; maybe it will shrink. The main feature here is that typos or fixes or additional master formats can be contributed. Challenges have been noted (revision wars; concurrent editing; bogus fixes; spam/inappropriate additions; inconsistent files...) II. from those master files, various other file formats can be [and are, currently] derived automatically. These include EPUB, Kindle variants, variations on HTML or text (especially if they were not previously provided), RTF, and a few others. Again, maybe this list will grow, maybe it will shrink. I do hope to offer conversion on-demand, which will let people select conversion options, and maybe even different conversion programs, for their purposes. The main features here mostly exist, but not as flexibly as I'd like to see. For example, applying a variant CSS. Or making a PDF with a specific font and paper size. Many challenges are technical, such as increased sophistication in dealing with text and HTML as master formats. Others need to be addressed by policy or social means, such as the ongoing tendency to use HTML for layout that is difficult to automatically convert. These, and others, have also been discussed deeply. III. from those master files, various other file formats that are created/contributed by individuals. I get offered these (via help@) practically every day. Usually EPUB, but also RTF/DOC, PDF. Often with typos applied. These are what I called "lovingly prepared," though of course some are better than others. These can be better than automatically-generated versions in various ways. They might have advantages over master files (for example, improved HTML). The main feature is that these would, in many cases, provide an improved reading experience (at least for some people, on some devices). If we accept that anyone could contribute such a new file (or set of files) for an existing PG eBook, then the main challenges I see are (a) how to help readers select among them, and (b) dealing with the fact that, over time, master formats will be fixed, but not these hand-crafted derivatives. I believe the solutions are related, and fairly easy. For (a), we need a community recommender system. Stars, batting average, +1, new/novell, etc. And, "dislike," "report a problem," "report abuse," etc. For (b), "time is on our side" (yes, it is). For a derivative format, we simply need to note when a master format was updated, but hand-derived ones were not. Plus, perhaps, a metric of how different the master format is from when the hand-derived one was created. Combined with (a), a recommendation would be attenuated based on such a metric. So, for example, a hand-derived file that gets a 90% quality rating from readers would slowly lose quality points as the master format is increasingly different. You get the idea... details TBD. Soooooooooooooooooo.... my main starting point was to ask about existing software for group editing. Version control systems seem a reasonable fit for this. Plus, sophisticated systems like TRAC also take care of managing users and their passwords. We do need to think of all the other stuff, but the basic idea was crowdsourcing for eBook formats/conversions/presentations. metadata and supplemental info (such as author bios). Note especially that I don't envision developing tools to help potential contributors do the conversion. Not our bailiwick: there are entire ecosystems of tools already, and all we need to do is support the community of interested donors of resulting files (including pointing them at recommended tools). The PG toolchain for automated conversion should remain available (i.e., http://epubmaker.pglaf.org). I hope this helps clarify my original suggestion a little better. There has been some great discussion on this and related topics. -- Greg Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation www.gutenberg.org A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org