New subject: Crowdsourcing (Re: Producing epub ready HTML)

27 Jan 2012

      Whew.  I counted 100 messages in 2 days.  Thanks for the lively
discussion.  I changed the subject for the theme that Joshua and
others mentioned (below).  The idea I am very interested in fostering
is capable online tools to let essentially anyone make edits, add
formats, or prepare derivative files, from any PG eBook.  Then, to
easily add those changes back into (an area of) the PG collection.

My view is that we could very easily have an additional major category
of file, for a given work.  Currently, we have two major categories:
first are files that go through WWers to get online, and second are
those that are automatically generated from the first type.  (While
this is a gross simplification, in fact at www.gutenberg.org it's
really easy to tell which is which -- they are in a different set
of subdirectories, with a different file naming scheme.)

A third (new) type would be those files that are, in some way,
modified, derived, or produced by other people and their tools.  Not
necessarily WWers or the original producers/submitters.  In a word,
crowdsourcing.  Or community editing.  Or version control.  Or
whatever you want to call it: the point would be that ANYONE with
desire and some basic capability could make changes to existing files,
or provide derivative files.

I can think of several major details, and many minor ones.  Concerns
about copyright, spamming, whether anonymous edits are permitted, a
review/revision/recision cycle, character sets, forking, searching,
etc., etc.  I would love to see multiple "master" files, created
lovingly by hand in any or all of RST, LaTeX, or yfm (Your Favorite
Markup) -- then allow users to select which master to use to generate
their, say, EPUB.  And, which tools to use for the conversion.  Of
course, people who wanted to lovingly craft an EPUB would be able to
upload that, too.

A capable crowdsourcing tool - preferably one that already exists, is
well-maintained, is free, and will require relatively few
modifications - is the starting point I'd most like to see.  Whether
we start with one book or 100 or 38000 doesn't matter to me, though it
matters a whole lot that the solution is scalable to the full
collection.  

As for the questions about whether this would be allowed, or would
pollute the essence of whatever, or piss off whomever: no, PG does't
work that way, and never did.  The answer is, and has been, "yes, go
for it.  It's all good, and on-mission."  I am sensitive to not
removing or undoing others' work, but my view is that current files of
the first type, above, would remain, and be easy to find, and that for
the main collection at www.gutenberg.org, the WWer process (perhaps as
modified, thanks to the new tools that have been under discussion)
would still apply.

Last year, I tried to deploy TRAC for group editing and version
control of PG eBooks.  It couldn't handle the directory count, and
never finished, though I'm ready to try again.  Or, a different tool.

As many subscribers have heard before, I have some hefty servers
that can be used for experimentation and proof of concept.  That's
not the hard part.

If people are aware of good tools we could base this on, please speak
up.  I can elaborate on why I prefer to start with an existing tool,
but in a nutshell it is because (as many have pointed out), the
fundamentals of crowdsourcing and file revisions are already covered
by a bunch of excellent tools.  Let's not reinvent the parts that
others are doing well since, after all, there are plenty of challenges
that are unique to Project Gutenberg or to eBooks in general.

  -- Greg

On Wed, Jan 25, 2012 at 01:01:08AM +0100, Marcello Perathoner wrote:
...
On 01/24/2012 11:08 PM, Joshua Hutchinson wrote:
...
So, if someone were to start "refactoring" old PG texts into TEI or RST and
working with a WWer to repost them ... is this a workable idea?
More than a technical challenge it would be a political one. I can
convert a novel the size of Pride and Prejudice into RST in about an
hour. More if there is formatting or images to recover. But I'd
prefer to avoid the riot that will ensue if we start to reformat DP
texts.
We could start redoing the top 100 list excluding everything that is
too hard and everything made by DP.
...
Maybe we start this process on a semi-private mirror of the PG corpus and only
when it reaches a critical mass of some sort it gets moved over. But an official
notice that this project has some backing is necessary or we'll just keep seeing
everything running around in ten different directions and nothing ever getting done.
A semi-official branch would be a good occasion to ditch the old
WWer workflow in favor of a source repository (git or mercurial)
that holds all the masters.
Should we reserve a range of ebook nos. or shadow the existing ones?
-- 
Marcello Perathoner
webmaster@gutenberg.org
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d

Crowdsourcing (Re: Producing epub ready HTML)

Keith J. Schultz

tags

participants (12)