Re: [gutvol-d] Producing epub ready HTML

25 Jan 2012

      On Tue, January 24, 2012 6:33 pm, James Adcock wrote:
...
Don>Starting at the most basic level, is there any good reason not to use
utf-8 as the basic encoding standard for everything including plain-text?
No.
...
BOM or no BOM?
Depends on the file. XML files (XHTML, TEI, etc.) are guaranteed to be ASCII
in their first line, and that first line declares the encoding, so no BOM is
necessary (and would probably confuse some tools). Subtle markup languages
like reStructuredText which have no prolog need some mechanism to indicate
that they contain UTF-8 encodings (to distinguish between that, latin-1 or
MacRoman) so may need to have a BOM.
...
Unix or Windows style line breaks?
Don't know that it matters, but my preference would be Unix.
...
Line breaks meaning paragraph separations or line breaks meaning, well,
whatever it is that PG means by line breaks?
All lines will wrap when displayed, so a mechanism is needed to indicate "this
is not just whitespace it really is a new line!" All markups have a mechanism
for this purpose.

For ease of proofreading, I recommend that text lines be broken with
insignificant new line characters at the same point as in the original text to
the extent possible (hyphenated lines cannot follow this rule, and should be
broken at the next, or previous, available whitespace.
...
"uft-8" meaning that "we" use the interpretation of the code points as
defined by Unicode, or meaning "we" invent our own meanings for those code
points?
Unicode without composition.

Most have argued that UTF-8 requires Unicode. Technically you can UTF-8 encode
any set of code points, but for this project it would serve no purpose.

Re: [gutvol-d] Producing epub ready HTML

Lee Passey