
On 4/19/2010 1:18 PM, Bowerbird@aol.com wrote: [snip]
what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file...
bb is correct when he suggests that sometimes the .epub file is 2 generations removed from the Impoverished Text file. If there is no hand-crafted HTML file, there is an option to download a computer-generated HTML file. If you were to download an .epub file for one of these texts for which only ITF is stored (I used _War of the Worlds_, etext 35) you would see that the internal HTML differs from the computer-generated HTML only by the fact that the computer-generated HTML contains the metadata in <meta> elements whereas the .epub contains the metadata in the content.opf file, and by the fact that the .epub file contains a link to "pgepub.css" whereas the computer-generated HTML does not (why not? what harm would it do? For that matter, why not leave the metadata in the HTML file as well?). Presumably .epub generation is a linked process whereby ITF is converted to HTML which is then encapsulated in the OCF. Because native HTML is relative uncommon at PG, I would guess that most .epub files start the process as ITF.