
On 4/19/2010 12:25 PM, Jim Adcock wrote:
In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion."
You make an interesting thesis, which, rare in the case of DP/PG arguments, is eminently testable. I have done so, and you clearly have not. Take a PG HTML zip file, say "76" for the sake of completeness. Download it, and unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for concreteness. Download it, and unpack it on your computer.
Now, look at the contents.
Do they have the same HTML files?
Yes, they do. The file names have been altered, but the content is virtually the same. [snip]
Do the have the same number of HTML files?
Yes they do. Each has eight parts plus the godawful and legally unnecessary PG header (Apple is doing the world a favor by stripping it away. [snip]
Are the contents of the HTML files identical?
No they are not.
No, they are not. Mr. Perathoner's files 1.) have been converted from ISO-8859 to Unicode/UTF-8; 2.) have extracted the internal style sheets into external style sheets; 3.) have added a links to a "center contents pages" and generic "pgepub" stylesheet; 4.) have added "id" attributes for use by .epub user agents for navigation; and 5.) have changed all the internal links to match the file paths inside his archive. All of these steps, except #3, are harmless and do not affect the presentation of the content. Indeed, with the exception of centering the tables they are probably all desirable things to do.
For the sake of completeness, open the first HTML file of each. Do the files RENDER the same on your browser when you actually TRY them to see if your thesis is correct?
No they do not RENDER the same.
First of all, it is your thesis not mine. I rarely, if ever, download files from PG; instead I get them from some other source where the quality of the files has more importance. But you are correct, with an unaltered archive they do /not/ render the same. However, if you delete the "pgepub.css" file, or delete its contents, they /do/ render the same with the exception of the centered tables of contents. If you delete all the odd numbered .css files, then they /do/ render identically. This is, of course, exactly why embedding style information inside an HTML file is a bad thing (you can't change the presentation without editing the HTML) and including a link to a generic stylesheet is a good thing (just find the stylesheet you like, copy it over the top of the generic one, and voilĂ , your book, your way). All of this can be accomplished by using a visual zip tool, and without ever having to edit a file (other than your zipper). Although we definitely need to talk Mr. Perathoner out of adding a link to a "center me" style sheet.
It is an interesting thesis that PG epub files are "just" a zipped version of the PG HTML files -- but it is an easily demonstrably false thesis.
I never said that /PG/ .epub files are just a zipped version of /PG/ HTML files; I said that technically conforming .epub files are just zipped versions of their source HTML files. It is certainly possible to take an HTML file, alter it, and make an .epub file from the newly altered file. Personally, I would view that as a flaw in the conversion software, though, and independent of the issue of .epub encapsulation.
Marcello's epub software does more than "just" pack the HTML files into an epub package. Ask him for a copy of his converter software, and see what the conversion actually entails. And/or ask Marcello what conversions he actually does to move from the HTML version to the epub version.
True. Apparently, Mr. Perathoner's software extracts embedded CSS information and moves it to an external style sheet (as it should), creates a "<div class='c1'>" around the tables of contents and illustrations, with a corresponding style sheet that centers the contents (which it should not), and adds a link to a generic "pgepub" style sheet (as it should), in addition to altering names for navigation purposes. Now apparently, your complaint is not that PG HTML does not make good .epub files, or that including a generic stylesheet "breaks" the ".epub", but that you don't like the .epub generator that Mr. Perathoner wrote. That complaint, with which I sympathize, needs to be directed to him individually; it cannot, however, be generalized to /all/ .epub files, only those created by his software.
Thus again, I suggest that it would be a good idea to have a portable version of Marcello's epub conversion software that we could use for testing on our local machines. Given a portable version of the epub conversion software going to mobi is easy using the same Amazon/Mobipocket provided epub->mobi conversion software that Marcello is already using.