Re: [gutvol-d] procedures for contributions, forum for questions and ideas

How about trying your experiment with more recent books? The current crop are up in the 40000+ range.
OK, as a "sanity check" I went back and double-checked more recent submissions to see if they are no being formatted "reasonably correctly." IE, if this was an e-book that I had bought commercially would I say "yes, this formatted reasonably" or "no, this book has corrupted formatting." Not a high standard, just: does this book "work" or not? 40000 no 40001 no 40002 no 40003 no 40004 no 40005 no 40006 no 40007 no 40008 no 40009 no The two most common and glaring problems are: 1) Paragraphs are not formatted "reasonably" corresponding to any known standard of formatting at any point in time of mankind. And 2) No TOC in an e-book where one would reasonably expect and need a TOC.

On Mon, Sep 24, 2012 at 1:03 PM, James Adcock <jimad@msn.com> wrote:
How about trying your experiment with more recent books? The current crop are up in the 40000+ range.
OK, as a "sanity check" I went back and double-checked more recent submissions to see if they are no being formatted "reasonably correctly." IE, if this was an e-book that I had bought commercially would I say "yes, this formatted reasonably" or "no, this book has corrupted formatting." Not a high standard, just: does this book "work" or not?
40000 no 40001 no 40002 no 40003 no 40004 no 40005 no 40006 no 40007 no 40008 no 40009 no
The two most common and glaring problems are:
1) Paragraphs are not formatted "reasonably" corresponding to any known standard of formatting at any point in time of mankind.
And 2) No TOC in an e-book where one would reasonably expect and need a TOC.
What is our standard target (more specifically, what were you testing with)? I have found commercial ebooks render considerably differently in FBreader and Coolreader on Android, not to mention PG books (I almost always have to adjust the default font size on a per-book basis, at the very least). I think there are similar rendering differences between the Kindle, Nook, and Sony readers. I've fallen back to running them through a TTS engine and listening to them exclusively as MP3s again, as none of the Android ebook readers I've tried does an acceptable job of realtime TTS, so the formatting issues no longer bug me. :/ -R C (I listen to books during my commute, but was hoping to be able to switch between reading and listening; no go. Both of the readers had consistent trouble with their TTS plugins, both with the default voice and with the Flite engine; either skipping significant sections of text, or repeating it, as well as problems of not registering as a background service, getting swapped out, and losing its place in the book. Also, the voices were reeaallyy slow compared to what I usually run my desktop TTS at.)

What is our standard target
"We" [PG] do not have a standard target. "We" often pretend that HTML *is* a standard in the first place, and that the HTML that we see rendered on our favorite computer on our favorite browser on our favorite display represents the same results that some other customer is going to see on their favorite computer using their favorite rendering software onto god knows what kind and size of display technology.
(more specifically, what were you testing with)?
I, specifically, was testing on a Kindle Klassic and a Kindle Fire, because the Amazon devices are what most real world people read ebooks on, and they display the most errors in PG book submissions. But I went back and checked on an epub device and didn't find much better luck. And sometimes the Amazon devices actually show better results on PG books than epub readers. Note that the PG/submitter sausage-making chain typically runs: Txt->html->epub->mobi So that the mobi files, as being most derived, are most likely to display the greatest range of errors coming from the entire "PG" process. That and Amazon/mobi made some design decisions that predate modern HTML design rules, and that complicates things. Point being that one *can* write HTML that works pretty much everywhere, but many people insist on writing HTML which only works on their particular favorite flavor of computer. DP in particular has a peculiar strain of HTML writers who are openly hostile to writing HTML in an in-practice portable manner. I have found commercial ebooks render considerably differently in FBreader and Coolreader on Android, not to mention PG books (I almost always have to adjust the default font size on a per-book basis, at the very least). I think there are similar rendering differences between the Kindle, Nook, and Sony readers. Yes but if the HMTL is written correctly the results should work on all of these devices. Well, not that sure about FBreader and Coolreader - I blew them off after brief tests as being hopelessly incomplete.

Hi All, Am 25.09.2012 um 19:36 schrieb James Adcock <jimad@msn.com>:
Note that the PG/submitter sausage-making chain typically runs:
Txt->html->epub->mobi
Well, here is one place where improvement can be made. Txt-> html->epub ->mobi This would allow for better tweaking. I one is worried about doing parts of the conversion twice one can always put that in intermediate files and go on from there. It is all a mater of modularity and careful design. regards Keith.

Not too surprising. There has never been an expectation at DP that there be any markup to distinguish chapter headings from any other text that might share the same font characteristics. Paragraphs (as distinguished by "<p>" tags) can appear indiscriminately around text wherever it might need to be distinguished for whatever reason. One can't fault the proofreaders. No one of authority has ever shown an interest in markup to define structure rather than appearance. Don On Mon, Sep 24, 2012 at 10:03 AM, James Adcock <jimad@msn.com> wrote:
How about trying your experiment with more recent books? The current crop are up in the 40000+ range.
OK, as a "sanity check" I went back and double-checked more recent submissions to see if they are no being formatted "reasonably correctly." IE, if this was an e-book that I had bought commercially would I say "yes, this formatted reasonably" or "no, this book has corrupted formatting." Not a high standard, just: does this book "work" or not?
40000 no 40001 no 40002 no 40003 no 40004 no 40005 no 40006 no 40007 no 40008 no 40009 no
The two most common and glaring problems are:
1) Paragraphs are not formatted "reasonably" corresponding to any known standard of formatting at any point in time of mankind.
And 2) No TOC in an e-book where one would reasonably expect and need a TOC.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Mon, Sep 24, 2012 at 12:10:10PM -0700, don kretz wrote:
Not too surprising.
There has never been an expectation at DP that there be any markup to distinguish chapter headings from any other text that might share the same font characteristics.
Paragraphs (as distinguished by "<p>" tags) can appear indiscriminately around text wherever it might need to be distinguished for whatever reason.
One can't fault the proofreaders. No one of authority has ever shown an interest in markup to define structure rather than appearance.
HTML markup for structure, not layout, has been a PG directive for years. Unfortunately, valid HTML can do both. Maybe you mean, no one of authority at DP. Unfortunately, this directive has not yet arrived in the DP instructions, though it's been before them for years. Cf., http://www.pgdp.net/wiki/Post-Processing_with_RST:_reStructuredText The WWers have a hard time enforcing this, because DP PPers submit valid HTML compliant with the DP guidelines. We do sometimes bounce items, though. Attached is an email of such a case, which also demonstrates there is a fix in the works for DP procedures. -- Greg
On Mon, Sep 24, 2012 at 10:03 AM, James Adcock <jimad@msn.com> wrote:
How about trying your experiment with more recent books? The current crop are up in the 40000+ range.
OK, as a "sanity check" I went back and double-checked more recent submissions to see if they are no being formatted "reasonably correctly." IE, if this was an e-book that I had bought commercially would I say "yes, this formatted reasonably" or "no, this book has corrupted formatting." Not a high standard, just: does this book "work" or not?
40000 no 40001 no 40002 no 40003 no 40004 no 40005 no 40006 no 40007 no 40008 no 40009 no
The two most common and glaring problems are:
1) Paragraphs are not formatted "reasonably" corresponding to any known standard of formatting at any point in time of mankind.
And 2) No TOC in an e-book where one would reasonably expect and need a TOC.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I hope you're right. The references are too ambiguous for me to follow, but it appears this is a discussion about the appearance of drop-caps, not the structure of the document. I'm not familiar with the "Best Practices" mentioned. It would be interesting to see if it includes references to specifically identifying chapter boundaries and headings; identifying correspondence as correspondence rather than how to make it look identical to the original. One hopes they try applying the best practices to several real projects and see whether it results in well-structured chapters and pages and paragraphs on several devices. And discuss why certain practices achieve certain results. Don
HTML markup for structure, not layout, has been a PG directive for years.
Unfortunately, valid HTML can do both.
Maybe you mean, no one of authority at DP.
Unfortunately, this directive has not yet arrived in the DP instructions, though it's been before them for years. Cf., http://www.pgdp.net/wiki/Post-Processing_with_RST:_reStructuredText
The WWers have a hard time enforcing this, because DP PPers submit valid HTML compliant with the DP guidelines. We do sometimes bounce items, though. Attached is an email of such a case, which also demonstrates there is a fix in the works for DP procedures.
-- Greg
---------- Forwarded message ---------- From: "PGDP Gen. Mgr." <dp-genmgr@pgdp.net> To: "Matthew D. Wheaton" <mwheaton@kc.rr.com> Cc: gbnewby@pglaf.org, "'David Edwards'" <debook2164@hotmail.com>, "'Project Gutenberg Whitewashers'" <pgww@lists.pglaf.org>, dp-genmgr@pgdp.net Date: Sat, 11 Aug 2012 22:07:09 -0700 (PDT) Subject: RE: [pgww] Uploaded chinese_rhymes.zip Chinese Mother Goose Rhymes Hi Matthew, Greg, et al,
I am presuming this discussion is regarding illustrated first letters (aka drop caps)? Our Best Practices for HTML isn't quite ready for public review yet, but we do cover those. Basically we are recommending using CSS to show the drop cap imgs in the HTML and hide them from mobile formats entirely. (Matthew, I'd be happy to point you to our temporary page where it is explained in more detail--email or PM to me so we don't irritate the WWers, perhaps?)
If you folks would like to give it a field test and let us know how it works, that would be an extra bonus. If the work group has made an error somewhere we would like to know! :) -----
CSS (please be sure to read the N.B. regarding this solution, below):
@media screen, print /* so it shows in the HTML version or when printed */ { img.drop-cap { float: left; margin: 0 0.5em 0 0; }
p.drop-cap:first-letter { color: transparent; visibility: hidden; margin-left: -0.9em; } } @media handheld /* hide it from handheld media */ { img.drop-cap { display: none; }
p.drop-cap:first-letter { color: inherit; visibility: visible; margin-left: 0; } }
N.B.: The value “transparent” for the “color” property is not included in CSS 2.1; it was added in CSS 3. It works well in modern browsers, and the code also works well in most e-readers. If you use this code, however, you will have to tell the WhiteWashers to validate the CSS as CSS 3 when uploading your book.
HTML:
<div> <img class="drop-cap" src="images/drop-m.png" width="100" height="113" alt=""/> </div>
<p class="drop-cap">Marie’s prediction proved a true one, ...</p>
-----
I would be interested in being kept in the loop if possible. (email dp-genmgr @ pgdp.net)
Thanks!
/louise/ PGDP Gen. Mgr.
Matthew D. Wheaton wrote:
Thank you for informing me about this problem. I assume that it is a new requirement, and if it is unacceptable, there are a few other books which need revision.
-----Original Message----- From: Greg Newby [mailto:gbnewby@pglaf.org] Sent: Saturday, August 11, 2012 10:55 AM To: Matthew Wheaton; David Edwards Cc: Project Gutenberg Whitewashers; dp-post@pgdp.net Subject: Re: [pgww] Uploaded chinese_rhymes.zip Chinese Mother Goose Rhymes
Matthew,
We've found a serious problem with the formatting of this submission. It looks like your use of the "hide" characteristic results in the initial character not being displayed under some circumstances, for the rhymes.
From the CSS:
span.hide { display:none }
This cuts the first letter in epub/mobi after auto-conversion from HTML (i.e.,http://epubmaker.pglaf.org), and the HTML does't seem to display correctly in IE9 when browsed online.
As you might know, the process of updating the DP HTML procedures is ongoing. As such, some of the guidance from the PG WWers is not yet automated at DP or, sometimes, communicated during the DP workflow.
What is desired is HTML that displays on all devices. And, that converts properly to the various derivative formats.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

The WWers have a hard time enforcing this, because DP PPers submit valid HTML compliant with the DP guidelines. We do sometimes bounce items,
though. Attached is an email of such a case, which also demonstrates there is a fix in the works for DP procedures. Unfortunately your fix is not a fix, in that ebook readers are emphatically NOT "media handheld" devices. "media handheld" devices are those little pre-ebook reader PDA-like devices. Again the e-book reader manufacturers have emphatically decided en-mass that e-book readers are NOT "media handheld" devices. I understand that PG has this mistake coded into some of their tools, but not everyone uses these PG tools, which means that this "fix" breaks everywhere else in the world. This problem has been discussed many times. PG should be following standards, where standards exist.

One can't fault the proofreaders. No one of authority has ever shown an interest in markup to define structure rather than appearance.
I'm not sure I'm trying to find "fault" with anyone except perhaps Greg. What I am trying to do is engage people in a little fact from Planet Earth which they do not care to acknowledge, namely that the books they are producing are most often unreadable by real world customers, and when they are readable are still often an unpleasant and amateurish looking experience. I can download the html, fix it, and get an enjoyable experience most of the time in a few minutes, so for most books it's far from an unsolvable problem. Some book however, use fixed layout, or quasi-fixed layout, and then they truly are hard to fix. Such a shame that 100s if not 1000s of volunteer hours per book go to waste because "no one" cares about formatting.
participants (5)
-
don kretz
-
Greg Newby
-
James Adcock
-
Keith J. Schultz
-
Robert Cicconetti