Re: [gutvol-d] Crowdsourcing (Re: Producing epub ready HTML)

30 Jan 2012

      Hi Marcello,

basically I can agree.

Yet, below you have written:
"We have a strict original to follow and strict rules to apply."

Could you direct me to the strict rules that apply.

I the do not yet exist that is O.K. If the are not that strict or
concise that is fine, too.

I will be starting a series that I hope to lead to a concise and consistent 
system which will give guidance to those wishing to submit etexts/ebooks
to PG as well as guidance for developing tools and or tool chain.

regards
	Keith.

Am 30.01.2012 um 14:03 schrieb Marcello Perathoner:
...
On 01/30/2012 03:40 AM, don kretz wrote:
...
For one, maintaining text is not the same as maintaining source code. And
in particular,the work flow for software development is not the same as
ours.
But it is close enough.
All revision control systems store text files and work in a line oriented fashion. I don't see any difference between program source files and text files. (Here I'm thinking about assembled books, not single pages. Keep the line endings put, and we are already there.)
I've considered alternatives, but the best suited VCS seem to be either git or mercurial (hg), with a slight advantage for mercurial.
git is blindingly fast and because it only transmits compressed diffs, a multi-megabyte book can be edited in seconds if you already have the book checked out. Very interesting if you are on a GPRS link.
But the main thing git lacks is a way to check out parts of a project, which is of paramount importance for us. You don't want to check out the whole archive to edit one typo in one book. hg does have this. (From reading the docs, not from actual testing.) So with hg you can check out one book.
Another advantage is that hg is written in python (the PG conversion software and web application server are written in python) and has a very good python interface.
On the down side hg is a bit slower than git, but not very much, and not as widely deployed.
...
I realize we had different needs - here we're talking about at least
mostly-completed projects, not page-or-less components. But I don't see how
we easily avoid at least some extension in the proofing direction if we
really want to do continuous semi-open-access improvement of texts. I think
it requires administrative resources we'll never have to do it any other
way.
I'm very much against this `crowdsourcing´ of text improvement. We'll end up with the few good volunteers we have patrolling and reverting the edits of hundreds of clueless or malign individuals.
Our task is much more similar to software development than writing articles for wikipedia. We have a strict original to follow and strict rules to apply.
What we could implement is a system to flag potential text errors for revision. This system should ideally be integrated into the text itself (javascript). If any text location accumulates enough error reports, it will be presented to the errata team. But the first thing we'll need is page images for every book linked to the text and publicly available.
-- 
Marcello Perathoner
webmaster@gutenberg.org
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d

Re: [gutvol-d] Crowdsourcing (Re: Producing epub ready HTML)

Keith J. Schultz