[gutvol-d] Re: DP output is technically obsolete

19 Apr 2010

      On 4/19/2010 12:25 PM, Jim Adcock wrote:
...
...
In other words, an ".epub" file is just a ".zip" file with a few
additional metadata files added. Software that purports to "convert"
HTML to .epub should not do /anything/ to the source file, except
perhaps to insure that it is valid XHTML (for older HTML files). There
is no need to validate an .epub conversion, as no conversion should have
occurred. If a rendered .epub document does not look exactly like the
same collection of files rendered by a browser from the file system, it
is the fault of the .epub rendering software, not the "conversion."
You make an interesting thesis, which, rare in the case of DP/PG arguments,
is eminently testable.  I have done so, and you clearly have not.  Take a PG
HTML zip file, say "76" for the sake of completeness. Download it, and
unpack it on your computer.  Take a PG epub "zip" file, say pg76.epub for
concreteness.  Download it, and unpack it on your computer.
Now, look at the contents.
Do they have the same HTML files?
Yes, they do. The file names have been altered, but the content is 
virtually the same.

[snip]
...
Do the have the same number of HTML files?
Yes they do. Each has eight parts plus the godawful and legally 
unnecessary PG header (Apple is doing the world a favor by stripping it 
away.

[snip]
...
Are the contents of the HTML files identical?
No they are not.
No, they are not. Mr. Perathoner's files 1.) have been converted from 
ISO-8859 to Unicode/UTF-8; 2.) have extracted the internal style sheets 
into external style sheets; 3.) have added a links to a "center contents 
pages" and generic "pgepub" stylesheet; 4.) have added "id" attributes 
for use by .epub user agents for navigation; and 5.) have changed all 
the internal links to match the file paths inside his archive.

All of these steps, except #3, are harmless and do not affect the 
presentation of the content. Indeed, with the exception of centering the 
tables they are probably all desirable things to do.
...
For the sake of completeness, open the first HTML file of each.  Do the
files RENDER the same on your browser when you actually TRY them to see if
your thesis is correct?
No they do not RENDER the same.
First of all, it is your thesis not mine. I rarely, if ever, download 
files from PG; instead I get them from some other source where the 
quality of the files has more importance.

But you are correct, with an unaltered archive they do /not/ render the 
same. However, if you delete the "pgepub.css" file, or delete its 
contents, they /do/ render the same with the exception of the centered 
tables of contents. If you delete all the odd numbered .css files, then 
they /do/ render identically.

This is, of course, exactly why embedding style information inside an 
HTML file is a bad thing (you can't change the presentation without 
editing the HTML) and including a link to a generic stylesheet is a good 
thing (just find the stylesheet you like, copy it over the top of the 
generic one, and voilà, your book, your way). All of this can be 
accomplished by using a visual zip tool, and without ever having to edit 
a file (other than your zipper).

Although we definitely need to talk Mr. Perathoner out of adding a link 
to a "center me" style sheet.
...
It is an interesting thesis that PG epub files are "just" a zipped version
of the PG HTML files -- but it is an easily demonstrably false thesis.
I never said that /PG/ .epub files are just a zipped version of /PG/ 
HTML files; I said that technically conforming .epub files are just 
zipped versions of their source HTML files. It is certainly possible to 
take an HTML file, alter it, and make an .epub file from the newly 
altered file. Personally, I would view that as a flaw in the conversion 
software, though, and independent of the issue of .epub encapsulation.
...
Marcello's epub software does more than "just" pack the HTML files into an
epub package.  Ask him for a copy of his converter software, and see what
the conversion actually entails.  And/or ask Marcello what conversions he
actually does to move from the HTML version to the epub version.
True. Apparently, Mr. Perathoner's software extracts embedded CSS 
information and moves it to an external style sheet (as it should), 
creates a "<div class='c1'>" around the tables of contents and 
illustrations, with a corresponding style sheet that centers the 
contents (which it should not), and adds a link to a generic "pgepub" 
style sheet (as it should), in addition to altering names for navigation 
purposes.

Now apparently, your complaint is not that PG HTML does not make good 
.epub files, or that including a generic stylesheet "breaks" the 
".epub", but that you don't like the .epub generator that Mr. Perathoner 
wrote. That complaint, with which I sympathize, needs to be directed to 
him individually; it cannot, however, be generalized to /all/ .epub 
files, only those created by his software.
...
Thus again, I suggest that it would be a good idea to have a portable
version of Marcello's epub conversion software that we could use for testing
on our local machines.  Given a portable version of the epub conversion
software going to mobi is easy using the same Amazon/Mobipocket provided
epub->mobi conversion software that Marcello is already using.

[gutvol-d] Re: DP output is technically obsolete

Lee Passey