
Given the (relatively) new openness at PG to accept HTML files, the need to fool whitewashers is gone. If I were preparing a new book to be submitted to PG I'd do everything in HTML and make sure /that/ version was canonical. I'd
Lee>I don't believe there /is/ any requirement "to submit separate hand-tooled lightweight markup." What there /is/ is a requirement that whatever you submit must have an impoverished text version as well as any other. Your beliefs unfortunately are in error, as the WWers will inform you after you violate the "unwritten rules that everyone knows about" -- if you submit a txt70 file which doesn't exactly match their unwritten rules, part of which is using the vertical whitespace markup correctly, and using the "_" and "*" markup correctly. And BB will go nonlinear because he recognizes that the rules represent a firm markup requirement. then used some kind of automated conversion tool to strip the markup from the HTML, and submit /that/ as the text version along with the canonical HTML version. And because fixing errors in PG texts is so difficult, it's unlikely that the two files will ever get out of sync. I haven't found a flawless tool that will strip HTML, even easy HTML, and reduce it flawlessly and without wiping out some of the utf-8, and will wrap anything close to the ridiculous txt70 wrap standards that are expected. So this all ends up being a day or two of needless grunt work tacked on at the end of what had previously been a long, but otherwise fruitful grind. And god forbid you make a mistake while grunting your HTML back down to txt70, you will be crucified!
It would be a ton less effort to write a style guide for HTML, and/or
Lee>Yes, and several of those style guides have been written in the past: see, e.g. http://www.hwg.org/opcenter/gutenberg/ and the Gutenberg wiki at http://www.gutenberg.org/wiki/Gutenberg:HTML_FAQ#H.4._What_are_the_PG_rules_ for_HTML_texts.3F. Marcello Perathoner has also written such a guide, but I can't put my finger on a reference to it quickly. Strange, whenever *I* cannot find one of the vaunted guides I am told I am an idiot. [Granted, but the guide writers point exactly being?] And that these guides totally miss the mark can be ascertained by comparing the rendering of PG html, epub, and mobi files in order to see just how successful these style guides have been in practice.
The problem with all these style guides is not that they are ineffective or technically incorrect (with some exceptions) but rather that the people who wrote them and those that consult them tend to approach the question of markup with religious fervor.
While missing the mark: You can markup according to these guides and what PG ends up publishing will still look like crap. Again, please take off the rose colored glasses and look at say the last 10 published versions books in all of html, epub, and mobi. Do they LOOK the same? Do they LOOK equally good? Are "real world customers" going to have an equally happy experience reading them on any of the three classes of devices? Why Not? It is certainly possible, if not relatively easy, to make an ebook that looks equally good on html, epub, and mobi devices. PG just isn't doing it. Other people take PG books, fix them, and republish them in a form that real world customers can actually read -- on other forums. Why can't PG do this with their own books? Why shouldn't PG be doing this with their own books? What is the point in publishing stuff which looks like crap?