
On Mon, January 23, 2012 11:42 am, Jim Adcock wrote:
The problem isn't that PG has no standards for HTML, the problem is that the WWers tell them to you after you have attempted to post to PG, and that those standards are different than given in the FAQ: http://www.gutenberg.org/wiki/Gutenberg:HTML_FAQ
Well, this degrades into a semantic argument. I would argue that unpublished requirements which depend on the whims of an individual are not standards at all; they are mere whims. The allegation that you have been given directions that are not published is troubling; I would suggest that you need to document these discrepancies, here if nowhere else.
Lee>1. Develop a consensus HTML coding style for PG. Heck, it doesn't even need to be a consensus, a mandate from TPTB would serve just as well, but a consensus is more likely to be adopted.
Just so long as PG follows the same set of rules for the PG tools which generate HTML, including the HTML that gets put into the generated "EPUB" and "MOBI" files.
This goes without saying. A standard is a standard is a standard. Indeed, most of the ... bizarre ... HTML that you have pointed out in the past is not an indictment of HTML files posted to PG, but a demonstration of the flawed tool set which produced it. I think it is important to distinguish between flawed HTML and flawed tools, because the solutions to the two problems are vastly different.
Lee>2. Build a small set of individuals who are familiar with PG's HTML coding style and could review HTML submissions. For example, I'm familiar enough with the use of HTML for encoding e-books that I'll bet I could judge whether a file is acceptable in less than 10 minutes. You could call this group of examiners "white washers" for lack of a better term.
Set them first to take a good hard look at the "HTML" code being generated by the PG tool set, and have them fix that first. Secondly I would hope that the WWers wouldn't be accepting "HTML" based "on form," when that "HTML" produces books which are unreadable in practice.
I disagree with your sequence. The incredibly messy HTML being generated complies completely with the PG standard, which involves pretty much just 1. move inline CSS to an in-file <style> block and 2. Make sure the HTML complies with the DTD. When your standards are that broad just about anything passes the test. I think PG needs to back up and build a comprehensive set of standards that produce useful HTML, then ensure that the automated tools build HTML that satisfies the standard. Looking at the problems with the generated HTML may be a good place to start in developing the standards, but one can't "fix" the current tool set if there is no basis by which to judge the efficacy of the "fix."
Lead by example.
Okay. My example is to produce a good HTML file without regard to what PG wants, post it to the Internet Archive, then tell someone at PG where it is and that they can go get it if they want it.