
On Mon, December 12, 2011 8:02 am, Jim Adcock wrote:
One of the things that make books ugly at PG is the problem of simple paragraph formatting, which sounds like a simple issue under HTML, but which in practice becomes a mess particularly in EPUB and even more so in MOBI (Kindle).
[snip]
How then does one work around these problems? Once one recognizes that there really is a problem, then three (partial) work-around solutions come to mind:
1) Do not even specify paragraph formatting, but rather allow the built-in paragraph formatting in each HTML, EPUB, or MOBI device to do its job.
[snip]
IE: use the <p> tag to specify paragraphs, not things which are not paragraphs.
Excellent advice, but advice which is regularly ignored, probably because most automated tools assume /every/ division of text is a paragraph (which is probably a better assumption than assuming that every block of text is /not/ a paragraph, and an automated tool has to go one way or the other). Almost all HTML tags have "semantic overload": when you use a particular tag it is assumed that the enclosed text carries that tag's semantics. While almost all HTML tags have this semantic overload, it is most apparent in the <p> tag. Most of us can recognize what is, and is not, a paragraph, and most of us know that paragraphs are typically rendered as 1. a division of text beginning and ending on a new line where the first line is indented a perceptable amount and which has no space between paragraphs, or 2. a division of text beginning and ending on a new line without indentation, but with one blank line between paragraphs. When you mark a block of text as a <p>aragraph, ask yourself "what will this look like if the user switches from presentation 1. to presentation 2, or vice versa. If it makes a difference you probably don't have a paragraph. When using automated tools which want to make everything a paragraph, I like to add a style rule at the beginning of my file: p {text-indent: 50%} This produces a paragraph indentation that is blatently excessive; but now I can scroll through the text and quickly identify those divisions that are not really paragraphs. There are two tags in HTML that are specifically free from semantic overload: <div> and <span>. Any time you encounter a division of text which has been marked with a <p>, but obviously isn't based on the foregoing rules, replace the <p> with a <div>. You can go back later and figure out the semantics of the <div> but in the short term you will get the result you want. <p> is not the only tag which is often used counter to its semantics. Another example is the <h4> tag, which is intended to be used as a 4th level header or title and typically is rendered as left-justified, bold textual division. Because of this, I have seen some producers use <h4> as table of contents items, e.g.: <div class="toc"> <h4><a href="ch01.html">Chapter One</a></h4> <h4><a href="ch02,html">Chapter Two</a></h4> <h4><a href="ch03.html">Chapter Three</a></h4> ... Frequently, people expect titles to be centered on a page. If you were to build a TOC like this, you should ask yourself, "what will this look like if the user switches to a centered presentation for titles?" This block obviously has the semantics of a list, and should be marked that way. At the very least, the list items should be changed to <div>, as <div> is free of semantic overload. In ePubEditor I have a TOC builder which builds its list according to header tags. You can image what this kind of structure does to my TOC builder. Generally, if you have added styling to /any/ HTML element other than <div> or <span>, you are probably using the wrong element. And if you have added styling to a <div> or a <span> you should ask yourself if there is not some HTML element which possesses the semantics of the styled element. Frequently there won't be, but asking the question helps.
PS: I have verified that the problems discusses are not problems introduced by epubmaker per say.
I apologize for being pedantic, but it's /per se/ "by itself", from the Latin /per/ ("by, through") and /se/ ("self, itself, himself, etc.).