
On 1/18/2012 8:15 PM, Jim Adcock wrote:
EPUB:
The ngx or whatever its called seems to be way overpopulated.
NCX. Navigation Control file for Xml. Designed by Daisy to support Digital Talking Books. Rammed through the IDPF with little discussion by Adobe, who is the IDPF's 800-lb Gorilla now that Microsoft has stopped caring. Adobe preferred the NCX to the existing "tours" element, probably as a result of the NIH syndrome. NCX is a rather chatty format, due in large part to the fact that it was designed as a navigation aid for audio books for the blind rather than as a simple table of contents. Fortunately, end users never need look at it, and it can easily be built using automated tools rather than by hand.
Page numbers now have magically jumped from the left margin to the right margin and what used to read "[PG 014]" now just reads a small grey "14" -- how do they do that?
Magic. Seriously, I think one needs to be cautious when trying to judge the quality of an ePub file by how it looks in any particular User Agent. Due to the influence of Adobe on the ePub specification, I believe that the Adobe Digital Edition reader is probably the most compliant of the ePub readers available, but even that software is not fully compliant with the specification. If you find problems with the way an ePub looks using any particular software it is as likely that the problem is with the software as it is that the problem is in the file. I'm guessing from your description that ADE is what you were using to look at the ePub. The developers of ADE seem to be committed to making ePubs as much like PDF as possible. Thus, ADE automatically puts a page number in the right hand margin of the display. If page numbers are not actually included in the ePub (more on this later) ADE will make some up. The only relation these made-up numbers have to the underlying text is that they are sequential, i.e. 2 is guaranteed to come after 1. BTW, ADE is not the only User Agent that carries on this charade. I know that Aldiko does it, and it's possible that the Nook does it as well. In Aldiko I've learned how to turn it off; so far, I haven't been able to figure out how to do that in ADE. [snip]
Hm, except now I see a line that says:
[pg 002][pg 013] ... 7
So now I am confused: Is this page 2, or page 13, or page 7 ???
The NCX file that is part of ePub is used to create the only Table of Contents that ADE recognizes (remember, NCX was Adobe's idea in the first place). But that is not it's only purpose. In addition to the <navMap> section, which defines that which is approximately a Table of Contents, an NCX file can also contain a <pageList> section which is intended to contain a list of page anchors that point to the beginning of each page in the document (because, after all, PDF is page oriented). In ADE, if this page list is present in the NCX file it is used instead of the made-up numbers. I think that what you're seeing is page numbers which were unadvisedly placed into the source HTML file and not suppressed by your User Agent (many ePub User Agents don't do a good job with the { display:none } style), plus the page markers auto-generated either from the NCX file or made up just because ADE (or equivalent) seemed to think you would want them (I don't, but YMMV). The HTML ePub maker used by PG does a pretty good job at constructing an NCX file from the structure of an HTML file based upon the ordering and levels of <h?> headers. It could probably be enhanced to build a page list in the NCX file as well, by finding the embedded page anchors in the HTML, adding references to those anchors to the NCX, and removing any visible component of the anchor before packaging the ePub. Anyone who is evaluating the quality of the ePubs produced by the PG tool would be well advised to learn the ePub and XHTML specifications, unzip the subject ePub and examine the markup by hand. There's simply too much variation among User Agents to be able to come to any valid conclusion any other way.