
In fact, the biggest problem with HTML as a master format is that it is (mis-)used for layout, not just structure. This is actively discouraged. At http://upload.pglaf.org, we request that the uploaded master should convert well to other formats, using epubmaker. While this is intended to avoid layout choices that don't map well to other formats, it is not extensively verified, by human or machine.
Although HTML was originally intended to be a "semantic" mark-up language, it is often grossly abused, has grown on steroids during the browser wars, and should be considered a first attempt only. As a semantic mark-up language it has many deficiencies, which are only to a very minimal extend addressed in HTML5. Trying to automatically extract any reasonable structural knowledge out of HTML, without very strictly enforced HTML coding standards is close to impossible. But that is exactly what you are trying to do here. In my post-processing chain, I work with TEI, and can just as easily generate the (monolithic) HTML and the ePub version from the same source, using the much better encoded knowledge of the high-level structure of a text that TEI offers me. Unfortunately, my better ePub's are not accepted, and instead, users get the inferior versions epubmaker spits out. This is frustration, and I am actually toying with the idea of getting my better ePubs on my own site instead. There are long discussions on the PGDP boards about post processing for ePub, but basically, I refuse to dumb-down my HTML for the sake of limited ePub readers, while it is very easy to have the best of all worlds: High quality HTML for the desktop readers, dumped-down ePub for the low-level ebook readers, and neat ePub (or ePub3) for the advanced readers. I already produce HTML that is semantic as far as reasonable within that format, and I am perfectly willing and capable of producing "enriched" HTML, which can include micro-format hints for PG tools, if only I was told how to. If my not-dumbed-down for the lowest denominator readers HTMLs would become no-longer accepted, I will no longer submit anything to PG, but set-up my own ebook site with like minded people. You can only ask that much from volunteers who work hard to produce nice things. I've submitted over 500 books so far. Jeroen.