Re: [gutvol-d] The problems with paragraph formatting at PG

12 Dec 2011

      On Mon, December 12, 2011 8:02 am, Jim Adcock wrote:
...
One of the things that make books ugly at PG is the problem of simple
paragraph formatting, which sounds like a simple issue under HTML, but which
in practice becomes a mess particularly in EPUB and even more so in MOBI
(Kindle).
[snip]
...
How then does one work around these problems?  Once one recognizes that
there really is a problem, then three (partial) work-around solutions come
to mind:
1) Do not even specify paragraph formatting, but rather allow the built-in
paragraph formatting in each HTML, EPUB, or MOBI device to do its job.
[snip]
...
IE: use the <p> tag to specify paragraphs, not things which are not paragraphs.
Excellent advice, but advice which is regularly ignored, probably because most
automated tools assume /every/ division of text is a paragraph (which is
probably a better assumption than assuming that every block of text is /not/ a
paragraph, and an automated tool has to go one way or the other).

Almost all HTML tags have "semantic overload": when you use a particular tag
it is assumed that the enclosed text carries that tag's semantics. While
almost all HTML tags have this semantic overload, it is most apparent in the
<p> tag. Most of us can recognize what is, and is not, a paragraph, and most
of us know that paragraphs are typically rendered as 1. a division of text
beginning and ending on a new line where the first line is indented a
perceptable amount and which has no space between paragraphs, or 2. a division
of text beginning and ending on a new line without indentation, but with one
blank line between paragraphs.

When you mark a block of text as a <p>aragraph, ask yourself "what will this
look like if the user switches from presentation 1. to presentation 2, or vice
versa. If it makes a difference you probably don't have a paragraph. When
using automated tools which want to make everything a paragraph, I like to add
a style rule at the beginning of my file:

p {text-indent: 50%}

This produces a paragraph indentation that is blatently excessive; but now I
can scroll through the text and quickly identify those divisions that are not
really paragraphs.

There are two tags in HTML that are specifically free from semantic overload:
<div> and <span>. Any time you encounter a division of text which has been
marked with a <p>, but obviously isn't based on the foregoing rules, replace
the <p> with a <div>. You can go back later and figure out the semantics of
the <div> but in the short term you will get the result you want.

<p> is not the only tag which is often used counter to its semantics. Another
example is the <h4> tag, which is intended to be used as a 4th level header or
title and typically is rendered as left-justified, bold textual division.
Because of this, I have seen some producers use <h4> as table of contents
items, e.g.:

<div class="toc">
  <h4><a href="ch01.html">Chapter One</a></h4>
  <h4><a href="ch02,html">Chapter Two</a></h4>
  <h4><a href="ch03.html">Chapter Three</a></h4>
...

Frequently, people expect titles to be centered on a page. If you were to
build a TOC like this, you should ask yourself, "what will this look like if
the user switches to a centered presentation for titles?" This block obviously
has the semantics of a list, and should be marked that way. At the very least,
the list items should be changed to <div>, as <div> is free of semantic
overload.

In ePubEditor I have a TOC builder which builds its list according to header
tags. You can image what this kind of structure does to my TOC builder.

Generally, if you have added styling to /any/ HTML element other than <div> or
<span>, you are probably using the wrong element. And if you have added
styling to a <div> or a <span> you should ask yourself if there is not some
HTML element which possesses the semantics of the styled element. Frequently
there won't be, but asking the question helps.
...
PS: I have verified that the problems discusses are not problems introduced
by epubmaker per say.
I apologize for being pedantic, but it's /per se/ "by itself", from the Latin
/per/ ("by, through") and /se/ ("self, itself, himself, etc.).

Re: [gutvol-d] The problems with paragraph formatting at PG

Lee Passey