[gutvol-d] Re: restructured-text -- the good, the bad, and the ugly

28 Dec 2010

      >   http://www.gutenberg.org/ebooks/34654
	>   http://www.gutenberg.org/ebooks/34605

IF one actually looks at the quality of the above resulting EPUB and Mobi
actually generated by this approach, one would see why RST and other txt
based approaches make many of us book submitters so unhappy!

HTML and therefore EPUB sucks as a method for coding books -- but even then
the results end up looking better than this RST!

I would suggest instead of "standardizing" on RST instead "standardize" on
EPUB for the input submission format, and move to EPUB3 when that comes out.
Txt70 and HTML formats can be "easily" downconverted from EPUB rather than
trying to guess info that isn't there when trying to move from Txt70 to EPUB
and Mobi.  EPUB would ideally be extended by PG conventions to cover issues
that come up frequently when trying to encode books to fairly represent the
Author's and/or Publisher's intent.
...
[quote: re "just encoding the words of an author"]
I can't find that email again, but, "just encoding the words of an author"
works great if the author's book "just" consists of a string of words.  I'm
not sure I've ever seen such a book, but I assume some exist -- representing
an author "just" encoding a purely aural tradition presumably. I was
thinking that Rudyard Kipling's "The Jungle Book" might be such a book
"encoding a purely aural tradition", but, now that I've looked it up the
answer is NO: Not even "The Jungle Book" is simply an encoding of a "string
of words." The Story "The Blue Hotel" from Stephen Crane's "The Monster"
comes close to being simply an aural encoding -- but even there the author
cannot help but include some visual representations in his book that do not
have an aural equivalent -- it's NOT just "a string of words."

If I might be so bold as to try to more correctly state the job of a
"modern" contributor to PG:

To encode, as simply but as accurately as possible, the intent of the
original author and/or publisher, in a way that can be as correctly
represented as possible, on the greatest number of display devices as
possible of actual people who want to read PG texts, and to the extent
possible also predict the future so that future customers can also so enjoy
PG texts.  And do this while minimizing the download and storage size of the
resulting downloadable file so that the customers can actually store and
read the book on their reader devices. 

In practice how much of the submitters job is "just encoding the words of an
author"?  In my experience one is lucky if "just encoding the words of an
author" represents even half the total amount of time and effort one puts
into a PG book submission.

Every time a submission requires more than one file -- and PG requires at
least three such submissions per book -- the more chances there are for
things to get screwed up -- and they DO get screwed up! When the encoding
language doesn't match the common job of representing the things one
commonly actually finds in real world common books, then things WILL surely
get screwed up!  Txt70 and RST being a case in point as being too weak.
HTML and therefore EPUB being both too weak AND too rich [too permissive
whilst at the same time not having the common elements to encode those
things one commonly finds in real books]

And further, in the real world PG contributors need submission formats that
have ACTUAL not THEORETICAL good "authoring" tools AND good rendering tools,
so that they can see in advance what their efforts are going to look like on
real world customers' reader displays.

[gutvol-d] Re: restructured-text -- the good, the bad, and the ugly

Jim Adcock