New subject: Goals and scope

31 Jan 2012

      (New subject+thread)

On Mon, Jan 30, 2012 at 08:54:18PM -0800, James Adcock wrote:
...
...
Again, the direction I hear you guys heading in is that you-all want to
reinvent DP because you-all think you can improve-upon DP.  Not saying you
can, not saying you can't. I'm just say that is not *what I heard* Greg
talking about.
...
From that standpoint, I've left many other (important!) things out of
scope: proofreading (distributed or otherwise), tools for
auto-conversion (we have some; others will plug-in but will be
developed separately), and even arguments about master format (we have
some, and better exposure of conversion tools and options will create
a positive feedback loop to guide contributors to do a better job.
And/or, open capabilities for OTHERS to fix the masters).  Cataloging,
Thanks. This is just what I was thinking of pointing out (catching up
on today's emails...).  Below are details on what I was envisioning.

The need I'm trying to address is reformatting or editing eBooks, not 
proofreading them.

For starts, let's consider books that get into the PG collection
in the usual way (i.e., culminating with a WWer posting them).  I
think we can do what's below while leaving the existing workflow
out of scope.  (It's not holy or untouchable or anything, just
out of scope for what follows.)

What I'd like is (as someone else nicely put it) a continual
improvement opportunity, provided to essentially anyone, for
eBooks in teh PG collection.

This boils down to a handful of critical activities.  It's mainly
the third one (III) that involves crowdsourcing and new tools.

I. making changes to the master file(s) [let's imagine that we retain
the practice of every PG eBook having a small number of master files,
in a small number of master formats].  The short list of master
formats includes RST, HTML, TeX/TEI, and plain text (perhaps with
light markup).  Maybe this list will grow in the future; maybe it will
shrink.

The main feature here is that typos or fixes or additional master
formats can be contributed.

Challenges have been noted (revision wars; concurrent editing;
bogus fixes; spam/inappropriate additions; inconsistent files...)

II. from those master files, various other file formats can be [and
are, currently] derived automatically.  These include EPUB, Kindle
variants, variations on HTML or text (especially if they were not
previously provided), RTF, and a few others.  Again, maybe this
list will grow, maybe it will shrink.  I do hope to offer conversion
on-demand, which will let people select conversion options, and
maybe even different conversion programs, for their purposes.

The main features here mostly exist, but not as flexibly as I'd
like to see.  For example, applying a variant CSS.  Or making
a PDF with a specific font and paper size.

Many challenges are technical, such as increased sophistication
in dealing with text and HTML as master formats.  Others need
to be addressed by policy or social means, such as the ongoing
tendency to use HTML for layout that is difficult to automatically
convert.  These, and others, have also been discussed deeply.

III. from those master files, various other file formats that are
created/contributed by individuals.  I get offered these (via help@)
practically every day.  Usually EPUB, but also RTF/DOC, PDF.  Often
with typos applied.  These are what I called "lovingly prepared,"
though of course some are better than others.

These can be better than automatically-generated versions in various
ways.  They might have advantages over master files (for example,
improved HTML).  The main feature is that these would, in many cases,
provide an improved reading experience (at least for some people, on
some devices).

If we accept that anyone could contribute such a new file (or set of
files) for an existing PG eBook, then the main challenges I see are
(a) how to help readers select among them, and (b) dealing with the
fact that, over time, master formats will be fixed, but not these
hand-crafted derivatives.

I believe the solutions are related, and fairly easy.  For (a), we
need a community recommender system.  Stars, batting average, +1,
new/novell, etc.  And, "dislike," "report a problem," "report abuse,"
etc.

For (b), "time is on our side" (yes, it is).  For a derivative format,
we simply need to note when a master format was updated, but
hand-derived ones were not.  Plus, perhaps, a metric of how different
the master format is from when the hand-derived one was created.
Combined with (a), a recommendation would be attenuated based on such
a metric.  So, for example, a hand-derived file that gets a 90%
quality rating from readers would slowly lose quality points as the
master format is increasingly different.  You get the idea...  details
TBD.

Soooooooooooooooooo.... my main starting point was to ask about
existing software for group editing.  Version control systems seem
a reasonable fit for this.  Plus, sophisticated systems like TRAC
also take care of managing users and their passwords.

We do need to think of all the other stuff, but the basic idea
was crowdsourcing for eBook formats/conversions/presentations.

metadata and supplemental info (such as author bios).

Note especially that I don't envision developing tools to help
potential contributors do the conversion.  Not our bailiwick: there
are entire ecosystems of tools already, and all we need to do is
support the community of interested donors of resulting files
(including pointing them at recommended tools).  The PG toolchain for
automated conversion should remain available (i.e.,
http://epubmaker.pglaf.org).

I hope this helps clarify my original suggestion a little better.
There has been some great discussion on this and related topics.
  -- Greg

Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation www.gutenberg.org
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org

Goals and scope (Re: Version control systems)

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

Keith J. Schultz

tags

participants (14)