
Lee Passey wrote:
On Tue, December 13, 2011 3:05 am, Paul Flo Williams wrote:
There is a fourth way: pre-process your (X)HTML to downgrade it to an HTML 3.2 tag soup + Kindle attributes so that the kindlegen step does nothing other than wrapping up your HTML into a MOBI file.
"Tag soup" refers to formatted markup which does not consist of correct HTML
[snip]
That said, it is true that Kindle relies on certain proprietary elements and attributes, which I guess qualifies it generally in the "tag soup" category. But you definitely want to keep the file as well-formed XML.
That's an extraordinarily verbose way of agreeing with me :-)
Kindlegen does a simple job badly, so bypass the bits that you don't need.
What you are suggesting is essentially to re-write Kindlegen to do things better. This is not a bad idea, but I think it may be a bit more complex than you think (or maybe I'm being unfair in suggesting that you don't grasp the complexity).
No, I'm suggesting doing some pre-processing to the HTML that you give to Kindlegen, converting constructs that it will convert badly into ones that fly straight through from your input to its mobi output. Here's a concrete example: Let's say I've got a poem with some long lines that I wish to markup. I don't know what font size or screen width the reader is using, so I want to make the display as flexible as possible. In the book that I'm copying, the poem already has wrapped lines: The first line of my poem The second line of my poem A longer line comes next, and goes on a bit but still does not rhyme The last line ends with a flourish! I've decided that a flexible way of marking this up in HTML is this: <html> <head> <style> .poem { margin-left: 2em } .line, .iline { display: block; text-indent: -2em } .line { } .iline { margin-left: 1em } </style> <body> <div class="poem"> <p class="verse"><span class="line">The first line of my poem</span> <span class="iline">The second line of my poem</span> <span class="line">A longer line comes next, and goes on a bit but still does not rhyme</span> <span class="iline">The last line ends with a flourish!</span></p> <p class="verse"><span class="line">The first line of my poem</span> <span class="iline">The second line of my poem</span> <span class="line">A longer line comes next, and goes on a bit, but still does not rhyme</span> <span class="iline">The last line ends with a flourish!</span></p> </div> </body> </html> You'll note that this needs CSS to show up as I intended, but the display in a modern browser works well. However, kindlegen does an awful job at converting this. So I decide to preprocess the HTML that I feed to kindlegen. I can strip the CSS entirely and use the classes to do some substitutions, so that I feed kindlegen this: <html> <head> </head> <body> <p height="1em">The first line of my poem</p> <p height="0" width="-2em"> The second line of my poem</p> <p height="0">A longer line comes next, and goes on a bit, but still does not rhyme</p> <p height="0" width="-2em"> The last line ends with a flourish!</p> <p height="1em">The first line of my poem</p> <p height="0" width="-2em"> The second line of my poem</p> <p height="0">A longer line comes next, and goes on a bit, but still does not rhyme</p> <p height="0" width="-2em"> The last line ends with a flourish!</p> </body> </html> (I haven't got a copy of Tallent handy, so I might have got the indentation trick wrong.) In this case, I selected elements to process by their class attribute, and performed some attribute and textual substitutions. I don't bother touching all the parts of the document that already convert well, so I don't have to fully process the CSS. This part of the document goes through kindlegen verbatim because there isn't anything it needs to convert. Of course, this is part of my toolchain for my books because I have made certain choices about the vocabulary I markup with, but the principle is simple enough. I don't need to understand the structure of mobi files or throw away kindlegen wholesale.