
On 10/6/2012 8:36 AM, Greg Newby wrote:
This is, more or less, exactly what I said we needed. There is no resistance to any of this. I even asked for input on figuring out what the requirements & enforcement would look like.
First question last: "requirements." This is the tough one, not because coming up with a list of requirements is hard, but getting consensus /is/. The first, and I think non-negotiable, requirement is that whatever standard is selected it must have a reasonably complete set of markup to capture all the features of a book. Figuring out just what this "reasonably complete" set of features is is nonetheless problematic. Given it's ivory tower origins, I believe that TEI should be the standard against which all other markups should be judged. So, the basic requirement is that whatever standard is selected, it should be possible to losslessly convert from the standard to TEI and back again, and likewise convert losslessly from any TEI file to the agreed-upon standard and back to TEI. Any markup scheme which can meet this requirement is a candidate. ePub is little more than a zip archive containing XHTML files. Even Kindle's .mobi format is a compressed XHTML format (although CSS is not supported). Thus, I have decided that XHTML is probably the best standard format, as it will require the least conversion and can probably be used natively without any conversion (TEI can also be viewed natively in a browser with the addition of an appropriate stylesheet, but it still makes some people nervous). So, here is a list of some of my requirements for XHTML files, more-or-less in order of importance: 0. Files should be created using HTML markup with the XML syntax. 1. Paragraphs should be marked with the <p> element. Anything that is /not/ a paragraph should /not/ be marked with the <p> element. A paragraph is one or more compete sentences with together relate to a common thought or purpose. If you don't know what a complete sentence is, just drop the book and back away. 2. Lists should be marked as lists. Numbered lists should use the <ol> element, and unnumbered lists should use the <ul> element. Tables of Contents are not tables, they are lists, and should be marked as such. 3. <table> should only be used for tabular data. Tabular data is data that obviously exists as rows and columns. Tables should /NEVER/ be used to force a specific presentation. Sometimes, user agents do not display tables well. This is the fault of the user agent, and not the fault of the markup. If accommodation for a specific user agent is possible using CSS, create a specific stylesheet file to be used with the document, but /don't/ try to alter the table to account for user agent deficiencies. 4. Book titles should be marked with the <h1> element, "part" titles should be marked with the <h2> element, chapter titles should be marked with the <h3> element, "section" titles should be marked with the <h4> element, and "sub-section" titles should be marked with the <h5> element. If a title is composed of both a main title and a subtitle, the subtitle should be distinguished from the main title by adding "class='subtitle'" as an attribute of the title element. Author's names in book titles should be indicated by <h1 class="author">. 5. Indented blocks should be marked with the <blockquote> element. A <blockquote> can contain multiple <p>aragraphs, so long as they meet the requirement of item 1. 6. Blocks of text that do not fit into any current HTML category should be marked with the <div> element. Because <div> is a generic block marker, whenever a <div> element is used it should be appropriately classified: e.g. <div class='chapter'>. Classification values of <div> blocks should be drawn from a controlled set of values, TBD. A <div> blocks should /never/ be used when it can appropriately be replaced with some other HTML element. On the other hand, do not use other HTML elements inappropriately just to avoid using the <div> element. 7. Style attributes should not be applied to any element. If an element needs a specific style, create a special class for that style and place the style specification into an external style sheet. 7a. Place all style rules in external style sheets, not internal style blocks. This way it is possible to change style effects without editing the file itself. 7b. Files should be created in such a way that presentation is adequate, even if not optimal, without the application of any styles. To be continued... I collected this article many years ago: http://www.passkeysoft.com/~lee/HTMLeBooks.html. It is dated, and I don't agree with all the recommendations, but it is quite readable and is a good foundation for making e-books using HTML markup.