the blind men and the .epub file-format

it's sad y'all know so much, yet so little, all at the same time. lee is correct. but he doesn't know what he's talking about. jim is correct. but he doesn't know what he's talking about. as lee says, an .epub file is just some (x)html files zipped up. as jim says, the .epub files at p.g. often differ from the .html. what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file... -bowerbird

what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file...
Not sure what part of the elephant you've grabbed hold of, but if you looked at the example in question it would be obvious that your answer isn't.

I, myself, am taking a dim view of some of these conversations, in re: that I am perhaps taking them more seriously than deserved. So, unless there are some requests for further comments, I intend my future comments to be more limited in seriousness and scope. mh

On 4/19/2010 1:18 PM, Bowerbird@aol.com wrote: [snip]
what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file...
bb is correct when he suggests that sometimes the .epub file is 2 generations removed from the Impoverished Text file. If there is no hand-crafted HTML file, there is an option to download a computer-generated HTML file. If you were to download an .epub file for one of these texts for which only ITF is stored (I used _War of the Worlds_, etext 35) you would see that the internal HTML differs from the computer-generated HTML only by the fact that the computer-generated HTML contains the metadata in <meta> elements whereas the .epub contains the metadata in the content.opf file, and by the fact that the .epub file contains a link to "pgepub.css" whereas the computer-generated HTML does not (why not? what harm would it do? For that matter, why not leave the metadata in the HTML file as well?). Presumably .epub generation is a linked process whereby ITF is converted to HTML which is then encapsulated in the OCF. Because native HTML is relative uncommon at PG, I would guess that most .epub files start the process as ITF.

Because native HTML is relative uncommon at PG, I would guess that most .epub files start the process as ITF.
Please don't guess, but rather check it out. For example of books posted in the last 24 hours, 15 out of 17 came with native HTML. Playing around with Advanced Search it reports 21786 books in HTML native format verses 20828 in text format. IE going back over the entire history of PG about 2/3rds of the books have HTML native format.
participants (5)
-
Bowerbird@aol.com
-
James Adcock
-
Jim Adcock
-
Lee Passey
-
Michael S. Hart