[gutvol-d] Re: DP output is technically obsolete

19 Apr 2010

      Carlo Traverso wrote:
...
...
...
...
...
...
"Greg" == Greg Newby <gbnewby@pglaf.org> writes:
Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso
    Greg> wrote:
    >> Is PG ready to accept Epub as submission format? (i.e. one
    >> submits a valid epub from which the other formats are derived)?
    >> If so, one can target Epub, otherwise at best one is forced to
    >> submit HTML or txt that converts not-too-badly with current PG
    >> tools, and this migh be extremely challenging.
    >> 
    >> Carlo
Greg> I don't think we're ready for this except in rare cases
    Greg> where ePub is the best format for display for a particular
    Greg> item (we just released a book where PDF was the best format,
    Greg> believe it or not).
Greg> The challenge is that when books are fixed, someone
    Greg> (typically the whitewasher, seldom the original submitter)
    Greg> needs to regenerate all the files from that book.
Greg> Since there is not yet any standard processing stream to
    Greg> generate static ePub files, this makes it hard for fixes (to
    Greg> HTML & text) to be applied to ePubs.
Greg> I would, of course, love to see something become our
    Greg> "standard" conversion tool, usable by anyone.  Right now,
    Greg> the closest for PG is Marcello's software to build the
    Greg> cached ePub files.  It's wonderful and functional, but is it
    Greg> ready for all envisioned purposes?  I think not, due at
    Greg> least in part to shortcomings of the input HTML.
That's the whole point of my proposal. Starting with hand-crafted HTML
we are likely to end with poor ePub, since the inference of metadata
might be wrong, and many features of HTML need to be tuned to ePub and
might not turn out correct;
And what about users who download the HTML to view on a mobile? You must 
  produce better HTML not for the sake of ePub but for the sake of 
universal usability.

The metadata come directly from the PG database and are updated whenever 
the PG database changes. That makes our metadata far more consistent 
than your proposal would do.
...
While obtaining reasonable HTML from ePub
is just unzipping and discarding metadata.
ePub HTML is often split into chapters, which may leave you with 50+ 
files after unzipping which you have to merge manually.
...
This is on my side an offer to work towards the production of a
toolchain along these lines, if it is not discarded a priori.
Before that can happen a major `paradigm shift´ has to happen at DP.

At DP the PPers enjoy to push their pet preferences down the readers 
throat: "What *I* See Is What You Get." And most PP time is spent in 
weaving those personal preference deep into the markup so as to make the 
markup pretty useless for anything but desktop devices with lots of 
screen, lots of cycles and lots of RAM.

What the PPers should do is to produce light semantic markup that lets 
the user choose the presentation and device: "Get It The Way You Want."

The PPers will have to relinquish their power of God -- or have it 
wrested from their hands -- and very strict guidelines will have to be 
put into place as to what markup is accepted.

-- 
Marcello Perathoner
webmaster@gutenberg.org