Re: [gutvol-d] The problems with paragraph formatting at PG

13 Dec 2011

      Jim Adcock wrote:
...
In MOBI land -- read Kindle and kindlegen (which is pretty much how
everyone
including PG is force to make MOBI files) there are two big problems:
1) Top and bottom margins are NOT merged.  When a paragraph follows a
paragraph then the vertical whitespace between those two paragraphs is
added
to each other.
2) Top and bottom margins are ROUNDED to the closest 1em.
[snip]
...
How then does one work around these problems?  Once one recognizes that
there really is a problem, then three (partial) work-around solutions come
to mind:
1) Do not even specify paragraph formatting, but rather allow the built-in
paragraph formatting in each HTML, EPUB, or MOBI device to do its job.
2) Specify (say) a 1.0em top margin and a 0.0em bottom margin.
3) *Almost* "Split the Baby" by specifying a 0.51 top margin and a 0.49
bottom margin.
There is a fourth way: pre-process your (X)HTML to downgrade it to an HTML
3.2 tag soup + Kindle attributes so that the kindlegen step does nothing
other than wrapping up your HTML into a MOBI file.

Granted, this won't help you pushing a nice MOBI through PG, but let's see
how it might work for your own delivery.

Let's say you've marked up your book at XHTML with CSS, and that you're
pretty much using this as your delivered EPUB, as well as for browser
consumption.

You find that kindlegen does a half-hearted job of converting your styles
into kindle-specific paragraph attributes. You've used mobiunpack to
examine just how poor the situation is.

You know from reading Joshua Tallent's book that using width and height
attributes on paragraphs, and liberal sprinklings of NBSP will help you
lay out your poetry in a way that works for the different font sizes that
a user might pick.

So, use xsltproc or Perl plus one of the XML modules to do what kindlegen
does, except applying some of your own conversions. You can use classes in
your XHTML source to target the elements that need some conversion help,
which will help you apply funky conversions to just poetry or just
footnotes or just the first paragraph after a heading, without needing to
understand XPATH.

Any non-trivial book is already going to involve some divergence between
your "ideal" source and the one you chuck through kindlegen for it to
up-chuck (er, I mean convert), so perhaps making kindlegen do *less* for
you is the solution.

Two concrete examples from the books I'm marking up at the moment:

1. The book uses unspaced em dashes, and I like them, so I'm marking them
up as U+200B U+2014 U+200B (i.e. zero-width space, em dash, zero-width
space), which allows lines to wrap either side of the em dash, exactly as
the original book does. That's helpful because the original book is 150
years old as uses those long chapter headings that TEI calls "arguments".
However, the Kindle doesn't understand ZWSP, but will do the right thing
if I change them to ZWNJ, which is the wrong character but it works.

2. I have some genealogical data in the back which is a set of
increasingly indented paragraphs, some of which are numbered. It looks
rather like ordered lists except that the first son and first daughter
both get numbered "1", and some other oddities, so I've just used
paragraphs. The Kindle destroys the indentation, so again I have to
pre-process the paragraphs in the fashion that Tallent demonstrates.

OK, so number (2) is rather specific to the needs of a single book, but
the first conversion has to be done a lot as I'm working through the book,
so it's scripted, along with any other help that kindlegen needs.

Kindlegen does a simple job badly, so bypass the bits that you don't need.

Re: [gutvol-d] The problems with paragraph formatting at PG

Paul Flo Williams