Some portions of the changes are I expect going to be 100% automatable,
and will be 100% beneficial in 90% of projects. Stuff like taking images/captions
out of fixed-size tables and putting them into %-sized divs. With EB I write regexes
that can get those right almost all the time.

Probably other candidates are footnotes, chapter headings, page numbers. I only
have EB and three or four others to work from. But we could automate that pretty
quickly, run it against a sample of the corpus, and check over the results thoroughly.
The key is to do no other tweaks but the automated ones so we find out how close
we can come.

Then we may know enough to plan the next step.

I'd like to hear what Lee and the others think first, though. They're better judges
than I am.

On Thu, Feb 2, 2012 at 11:15 PM, Greg Newby <gbnewby@pglaf.org> wrote:
On Thu, Feb 02, 2012 at 06:46:56PM -0800, don kretz wrote:
> ...
> It will take some work and cooperation. The critical question still
> remains: will PG allow existing projects to be altered this way? Under what
> condtions? With what verification requirements?

I already answered that in this thread, and the answer is that we do
have a procedure to get fixed files back (i.e., the errata process,
with a WWer in the loop).

A theme that is not well-handled by the errata process is, what
if only the HTML is tweaked, to make the file more epub (etc.)
friendly?  That is, when the "fix" is not typos/scannos/missing
pages, etc., etc., but simply formatting or markup?

The short answer is a rephrasing of the starting point from a few days
ago: I'd like to go ahead and make a way to get these back into the
collection, replacing the originals, *en masse*.  (Actually, we keep
the originals, in an 'old' subdirectory.)  I don't anticipate
opposition to this idea, assuming we're tweaking, not redoing the look
and feel crafted by the submitter.  How to tell which is which?

One thing we've done with a few very people who were very active
in posting/reposting/augmenting is give them direct access to
upload.  This is something we do AFTER the procedure is very clear.
It's easy to screw things up, trust me....

My emphasis in this discussion has been to look at ways to make this
type of process more efficient and scalable.  We don't want to have a
lot of back and forth discussion for every file, if we want to
eventually re-do thousands.  This interest is at least partially
selfish, since I'd rather not be part of a decision process for
every such fixed eBook that comes along, and I'm pretty sure the
current WWers have similar feelings.

 -- Greg


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d