Re: [gutvol-d] A new viewpoint (was: Re: gee)

6 Feb 2012

      ...
...
...
...
...
"don" == don kretz <dakretz@gmail.com> writes:
don> Why would text files not be derived from the master like
    don> other formats?  I have expected that derivation can only be
    don> automated from greater information density to lower.

    don> I think however that testing ideas early against real data as
    don> you suggest is important.
...
...
...
...
...
"Jim" == Jim Adcock <jimad@msn.com> writes:
Carlo> I also assume that most errata, the important ones, report
    Carlo> a
    Carlo> correction to the text, mostly fixing a typo. I suppose that
    Carlo> errata for markup are relatively unusual,

    Jim> This is where we disagree.  What I see *overwhelmingly* is
    Jim> that the errors in the files of PG are *overwhelmingly*
    Jim> massively errors of formatting.  I would be hard-put to find
    Jim> a half dozen scannos in a given PG file.  I can often find
    Jim> 100s if not 1000s of formattos in the same PG file.

    Jim> It's just that somehow the people at PG have become blind and
    Jim> tone-deaf to issues of formatting.  Again, things tend to
    Jim> "work" in HTML.  It's just the other formats that fall-down
    Jim> so badly.

I see that I have expressed myself incorrectly, since I have been
misunderstood (not by Greg, I believe). I try again.

First, what is a master format? It is not a format for distribution,
it is a format from which all other formats are derived, hence it
implies a toolchain to derive these formats, and should be defined in
a way that it will be able to derive future formats. 

Master formats are important since a modification (fix) to the master
can be reflected to all the distributed formats.  Moreover, when epub4
and Zoox formats (based on HTML6) will be released, it will be wasy to
provide good epub4 and Zoox for all the books with a good and rich
master file, taking advantage of the cool new features of HTML6 and
epub4, just adding new formats to the toolchain.

But PG has some 40000 books that don't have a good master format, and  
fixing a typo in a book having the standard hand-crafted 4 formats (HTML,
txt-UTF-8, txt-8 and txt-7 (ASCII) requires to fix 4 files and
regenerate the other ones. And the problem will become worse if we
allow hand-crafted epub and kindle files.

My proposal is a way to simplify the maintenance of the legacy
formats for the requests sent to errata-MMX@pglaf.org. Errata like
this one (many are much less clearly stated):

===========================
Title: Astounding Stories of Super-Science, October, 1930

Author: Various

Release Date: September 1, 2009 [EBook #29882]

Language: English

Page 7:
 "then low whirring noise" should be "then a low whirring noise".
 "You can take of your gas mask" should be "You can take off your gas
 mask".

Page 103:
 Should "The fumes might attract prowlers" be "The flames might
 attract prowlers"? The image is very unclear.

Page 118:
 "subterranean action shock the electron" should be "subterranean
 action shook the electron".

Page 123:
 "with what the knew already" should be "with what she knew already".

Page 139:
 "To the left Is the better path" should be "To the left is the better
 path" - wrong case.
=========================

I have access to errata now, and I will be in position to tell how many
formattos are submitted as errata, and to suggest modifications to the
errata procedures to allow an automatic correction of all the formats
just correcting the UTF-8 txt.

Of course this does not address the "formattos" (nor, for example, the
splitting of a paragraph in 2), but, if (as I suspect) errata receives
mainly typos, this might be a substantial reduction of the workload
for the errata team. And might allow PG to accept e.g. handcrafted
epub, and replacement of some "bad" autogenerated epubs with your
"good" fixed epubs. 

Carlo

PS: several posts have come while I was composing this one. I
especially agree with David's last post.

Re: [gutvol-d] A new viewpoint (was: Re: gee)

traverso＠posso.dm.unipi.it