Over the last 24 hours I have attempted to do that which I suggested others
can and should do also: namely look honestly, and objectively, as-if one
were examining not PG but some other publishing house, critiquing not one's
own efforts, but someone else's efforts, and ask the simple question:
If PG is in the business of publishing free books for real people to read,
then how well is PG succeeding in that job? What parts of the publishing
job is PG succeeding at? At what parts of the job is it failing? Etc.
It goes without saying that within the PG community many will consider even
asking these questions high treason. These same people however have
absolutely no difficulty asking, or answering to their own satisfaction,
these very same questions of other publishing houses, or of the efforts of
one or another PG volunteers, or of DP, for example.
To perform this 24-hour gut-check I looked at the four most common file
formats from the latest 10 submissions. Each file I examined from the
standpoint of 9 simple more-or-less analytical measures asking "Does this
file implement this common feature of e-books more-or-less correctly, i.e.
more-or-less attractively, and in a more-or-less readable way? Would the PG
community accept this from some other book publisher, or would we complain
about that publisher's efforts?" I ranked each of these measures simply Y
or N (Pass or Fail) because most of the time the answer is pretty obvious.
A few cases I ranked "0" (Does Not Apply)
Are these measurements somewhat subjective? Absolutely. If you don't like
my measures go make your own -- you can do so in about a day's work. But
please take off your PG-centric rose-colored glasses before you make these
measures!
I will state my summary conclusions first (which are somewhat surprising
even to me) and then will follow up with the raw data:
SUMMARY CONCLUSIONS
1) PG uniformly and universally fails on the issue of cover design. Cover
design is becoming more and more important to allow an e-book to be even
minimally usable on more-and-more ebook readers.
2) Volunteers almost universally succeed at producing high-quality HTML
files in a form that PG requests: Their results are almost always
attractive, "correct" and readable as an HTML file. The problem comes later
when these HTML files are processed using opaque tools and unwritten rules
in an attempt to create EPUB and/or MOBI from the HTML files, resulting in
an after-the-fact "gotcha" to the volunteers' efforts. Why not just read the
books in HTML form then? Because HTML browsers uniformly stink when it
comes to trying to use that browser as-if an ebook-reader. One clear area of
weakness would be regards encoding TOC in HTML.
3) Volunteers almost universally succeed at producing high-quality TXT70
files in the form that PG requests. UTF-8 has been a huge step forward in
this regard. Volunteers do what PG asks of them, whether or not they
personally agree with those directives. What use, if any, there are for
these files I leave to BB and others to decide.
4) HTML comes close to universally succeeding as an e-book format. The
problem is that there are not readily available high-quality ebook readers
that accept HTML as an input format and present a pleasant reading
experience for that format. HTML universally fails re cover images, fails re
Publisher Pages (PG + the Original Publisher) about half the time and
surprisingly fails TOC about half the time.
5) PG almost universally succeeds nowadays at the issue of char encoding.
Hurray! The old issue of code pages and chars showing up displayed wrong has
almost completely gone away, helped greatly I suspect by increasing use of
UTF-8. I did see one case of an EPUB file with code page issues.
6) TOC is handled pretty universally badly at PG, succeeding perhaps 1/2 the
time in HTML, EPUB, and MOBI formats -- but a given book might succeed TOC
in one format and fail TOC in another format.
7) Punctuation choices almost always succeed in all file formats, except if
one considers prosodic markings as part of punctuation (as I did in these
measures) then TXT70 fails prosodic markings.
8) Sentence structure is universally successful except in TXT70 where it
uniformly fails.
9) Paragraph structure is universally successful except in MOBI where it
fails about half the time.
10) Chapter structure is pretty much successful, failing occasionally in
MOBI and TXT70
11) Images almost always succeed (in the formats that support them) but
occasionally fail in one or another formats for reasons that are not readily
obvious.
12) "Publisher Pages" -- PG's plus the original publishers "title pages"
fail almost universally in all file formats except for MOBI, where
surprisingly they succeed most of the time. The major problem is PG added
boiler plate "PG Publisher Pages and Legalese" without caring about whether
those additions succeed or fail, or whether or not those pages represent PG
in a positive or negative light. Given that publisher pages have been
recognized for hundreds of years as "free advertising" to publishers this
seems surprising, given that this lack of caring works against the PG
mission.
The Raw Survey Results:
These are for books 38234-38243, except reported in the order 38238-38243
followed by 38234-38237
Y means "Generally Successful at this measure"
N means "Generally Fails at this measure"
0 means "Measure doesn't apply to this book" [i.e. book has no images]
Definition of the measures:
Charset?: Does this book seem to implement charset issues correctly, or are
there missing glyphs, or miss-codings of char points, or hack substitutions
for chars where the "real" char was available?
Sentence Structure?: Does this book display sentences in a complete and
sensible manner that would be considered consistent with established paper
book standards, and/or current ebook standards?
Paragraph?: Does this book display paragraph indentation and/or vertical
whitespace between paragraphs in a way which would be considered
more-or-less consistent with the last couple hundred years of paper book
publishing and/or current ebook standards?
Chapter?: Does this book display chapter breaks in a way which would be
considered more-or-less consistent with the last couple hundred years of
paper book publishing and/or current ebook standards?
TOC?: Does this book have a useful, reasonably complete TOC that is
reasonably compatible with the published standards of that particular ebook
file format?
Images?: If the book has images, do they display in a reasonably correct
useful manner?
Pubpages?: Do the publisher pages display in a reasonably correct readable
manner doing reasonable credit to the integrity of the brand of the original
publisher and to PG as the republisher?
Cover?: Is a useful cover image included that will function as expected by
the standards of the file format given, and to allow a given PG book to be
selected by cover in ebook readers which use cover information as a
selection mechanism?
Punc+Prosody?: Is punctuation and prosodic markers (bold and italic for
example) presented in a way which is consistent with the last couple hundred
years of paper book publishing and/or current standards of ebook publishing?
RAW Survey Results:
HTML
charset? YYYYYYYYY
sentence? YYYYYYYYYY
paragraph? YYYYYYYYYY
chapter? YYYYYYYYYY
TOC? YNYNNYYYNN
images? YYYYY00YY0
pubpages? YNYYYNNNNN
punct+prosody? YYYYYYYYYY
cover? NNNNNNNNNN
EPUB
charset? YYNYYYYYYY
sentence? YYYYYYYYYY
paragraph? YYYYYYYYYY
chapter? YYYYYYYYYY
TOC? NNYYNNYNNN
images? YYNYY00YY0
pubpages? NNNYNNNNNN
punct+prosody? YYYYYYYYYY
cover? NNNNNNNNNN
MOBI
charset? YYYYYYYYYY
sentence? YYYYYYYYYY
paragraph? NNYYYYNNYY
chapter? YYNNYYYYY
TOC? NNYYNNYYNN
images? NYNYY00YY0
pubpages? YNNYYYYYYY
punct+prosody? YYYYYYYYYY
cover? NNNNNNNNNN
TXT70
charset? YYYYYYYYYY
sentence? NNNNNNNNNN
paragraph? YNYYYYYYYY
chapter? YNNNYYYYNN
TOC? NNNNNNNNNN
images? NNNNN00NN0
pubpages? NNNNNNNNNN
punct+prosody?
cover?