Assessing the current successes or failures of PG in various file formats.

Over the last 24 hours I have attempted to do that which I suggested others can and should do also: namely look honestly, and objectively, as-if one were examining not PG but some other publishing house, critiquing not one's own efforts, but someone else's efforts, and ask the simple question: If PG is in the business of publishing free books for real people to read, then how well is PG succeeding in that job? What parts of the publishing job is PG succeeding at? At what parts of the job is it failing? Etc. It goes without saying that within the PG community many will consider even asking these questions high treason. These same people however have absolutely no difficulty asking, or answering to their own satisfaction, these very same questions of other publishing houses, or of the efforts of one or another PG volunteers, or of DP, for example. To perform this 24-hour gut-check I looked at the four most common file formats from the latest 10 submissions. Each file I examined from the standpoint of 9 simple more-or-less analytical measures asking "Does this file implement this common feature of e-books more-or-less correctly, i.e. more-or-less attractively, and in a more-or-less readable way? Would the PG community accept this from some other book publisher, or would we complain about that publisher's efforts?" I ranked each of these measures simply Y or N (Pass or Fail) because most of the time the answer is pretty obvious. A few cases I ranked "0" (Does Not Apply) Are these measurements somewhat subjective? Absolutely. If you don't like my measures go make your own -- you can do so in about a day's work. But please take off your PG-centric rose-colored glasses before you make these measures! I will state my summary conclusions first (which are somewhat surprising even to me) and then will follow up with the raw data: SUMMARY CONCLUSIONS 1) PG uniformly and universally fails on the issue of cover design. Cover design is becoming more and more important to allow an e-book to be even minimally usable on more-and-more ebook readers. 2) Volunteers almost universally succeed at producing high-quality HTML files in a form that PG requests: Their results are almost always attractive, "correct" and readable as an HTML file. The problem comes later when these HTML files are processed using opaque tools and unwritten rules in an attempt to create EPUB and/or MOBI from the HTML files, resulting in an after-the-fact "gotcha" to the volunteers' efforts. Why not just read the books in HTML form then? Because HTML browsers uniformly stink when it comes to trying to use that browser as-if an ebook-reader. One clear area of weakness would be regards encoding TOC in HTML. 3) Volunteers almost universally succeed at producing high-quality TXT70 files in the form that PG requests. UTF-8 has been a huge step forward in this regard. Volunteers do what PG asks of them, whether or not they personally agree with those directives. What use, if any, there are for these files I leave to BB and others to decide. 4) HTML comes close to universally succeeding as an e-book format. The problem is that there are not readily available high-quality ebook readers that accept HTML as an input format and present a pleasant reading experience for that format. HTML universally fails re cover images, fails re Publisher Pages (PG + the Original Publisher) about half the time and surprisingly fails TOC about half the time. 5) PG almost universally succeeds nowadays at the issue of char encoding. Hurray! The old issue of code pages and chars showing up displayed wrong has almost completely gone away, helped greatly I suspect by increasing use of UTF-8. I did see one case of an EPUB file with code page issues. 6) TOC is handled pretty universally badly at PG, succeeding perhaps 1/2 the time in HTML, EPUB, and MOBI formats -- but a given book might succeed TOC in one format and fail TOC in another format. 7) Punctuation choices almost always succeed in all file formats, except if one considers prosodic markings as part of punctuation (as I did in these measures) then TXT70 fails prosodic markings. 8) Sentence structure is universally successful except in TXT70 where it uniformly fails. 9) Paragraph structure is universally successful except in MOBI where it fails about half the time. 10) Chapter structure is pretty much successful, failing occasionally in MOBI and TXT70 11) Images almost always succeed (in the formats that support them) but occasionally fail in one or another formats for reasons that are not readily obvious. 12) "Publisher Pages" -- PG's plus the original publishers "title pages" fail almost universally in all file formats except for MOBI, where surprisingly they succeed most of the time. The major problem is PG added boiler plate "PG Publisher Pages and Legalese" without caring about whether those additions succeed or fail, or whether or not those pages represent PG in a positive or negative light. Given that publisher pages have been recognized for hundreds of years as "free advertising" to publishers this seems surprising, given that this lack of caring works against the PG mission. The Raw Survey Results: These are for books 38234-38243, except reported in the order 38238-38243 followed by 38234-38237 Y means "Generally Successful at this measure" N means "Generally Fails at this measure" 0 means "Measure doesn't apply to this book" [i.e. book has no images] Definition of the measures: Charset?: Does this book seem to implement charset issues correctly, or are there missing glyphs, or miss-codings of char points, or hack substitutions for chars where the "real" char was available? Sentence Structure?: Does this book display sentences in a complete and sensible manner that would be considered consistent with established paper book standards, and/or current ebook standards? Paragraph?: Does this book display paragraph indentation and/or vertical whitespace between paragraphs in a way which would be considered more-or-less consistent with the last couple hundred years of paper book publishing and/or current ebook standards? Chapter?: Does this book display chapter breaks in a way which would be considered more-or-less consistent with the last couple hundred years of paper book publishing and/or current ebook standards? TOC?: Does this book have a useful, reasonably complete TOC that is reasonably compatible with the published standards of that particular ebook file format? Images?: If the book has images, do they display in a reasonably correct useful manner? Pubpages?: Do the publisher pages display in a reasonably correct readable manner doing reasonable credit to the integrity of the brand of the original publisher and to PG as the republisher? Cover?: Is a useful cover image included that will function as expected by the standards of the file format given, and to allow a given PG book to be selected by cover in ebook readers which use cover information as a selection mechanism? Punc+Prosody?: Is punctuation and prosodic markers (bold and italic for example) presented in a way which is consistent with the last couple hundred years of paper book publishing and/or current standards of ebook publishing? RAW Survey Results: HTML charset? YYYYYYYYY sentence? YYYYYYYYYY paragraph? YYYYYYYYYY chapter? YYYYYYYYYY TOC? YNYNNYYYNN images? YYYYY00YY0 pubpages? YNYYYNNNNN punct+prosody? YYYYYYYYYY cover? NNNNNNNNNN EPUB charset? YYNYYYYYYY sentence? YYYYYYYYYY paragraph? YYYYYYYYYY chapter? YYYYYYYYYY TOC? NNYYNNYNNN images? YYNYY00YY0 pubpages? NNNYNNNNNN punct+prosody? YYYYYYYYYY cover? NNNNNNNNNN MOBI charset? YYYYYYYYYY sentence? YYYYYYYYYY paragraph? NNYYYYNNYY chapter? YYNNYYYYY TOC? NNYYNNYYNN images? NYNYY00YY0 pubpages? YNNYYYYYYY punct+prosody? YYYYYYYYYY cover? NNNNNNNNNN TXT70 charset? YYYYYYYYYY sentence? NNNNNNNNNN paragraph? YNYYYYYYYY chapter? YNNNYYYYNN TOC? NNNNNNNNNN images? NNNNN00NN0 pubpages? NNNNNNNNNN punct+prosody? cover?
participants (1)
-
Jim Adcock