
On 10/8/2012 5:03 PM, don kretz wrote:
On Mon, Oct 8, 2012 at 12:56 PM, Lee Passey <lee@passkeysoft.com <mailto:lee@passkeysoft.com>> wrote:
On 10/6/2012 8:36 AM, Greg Newby wrote:
This is, more or less, exactly what I said we needed. There is no resistance to any of this. I even asked for input on figuring out what the requirements & enforcement would look like.
The first, and I think non-negotiable, requirement is that whatever standard is selected it must have a reasonably complete set of markup to capture all the features of a book. Figuring out just what this "reasonably complete" set of features is is nonetheless problematic.
This is the first and only requirement. And not reasonably complete, absolutely complete. You can't format what you can't identify. But, if you identify everything, you can format it any way you want with software.
I do not believe a markup language exists that can capture the complete essence of a book, thus my use of the "weasel word," 'reasonably.' Rest assured, my bar of "reasonableness" is quite high, but it is not unrealistic.
And the list of things to identify can be done easily if your markup is extensible, because you keep adding markup identifiers until you don't need any more.
A very salient point, with which I completely agree. Because XHTML is the basis of all modern e-book formats, whatever markup is chosen must at some point be reducible to XHTML. XHTML has two generic elements, <div> and <span> to which semantic inflection can be added by use of the "class" attribute, and the "class" attribute can be added to any other element allowing refinement of their semantics. For this reason, XHTML meets your requirement of an extensible markup language, but also satisfies the goal of being a base language which can be used directly without transformation. Note that TEI also has generic block-level and inline elements, and semantic inflection can be added using the "type" attribute. Thus, TEI is also and extensible language even though the core elements are predefined and presumably immutable.
The rest of your requirements is just details and there are any number of equivalent schemes; they are interchangeable as long as the things requiring identification are unambiguously tagged or otherwise clearly identifiable.
True, but I obviously have not made myself clear. The primary purpose for a standard is to be predictable. To allow documents to be submitted in /any/ markup language makes it virtually impossible to develop tool sets to generate common output, or to maintain those documents. Further, standards provide a yard-stick that can not only measure compliance, but which can become a learning and training tool, so that when someone like Mr. Salzer comes along and asks, "how [does one] properly prepare HTML files for PG?" we can say, "here you go, follow these rules and you will be compliant, and if something doesn't make sense or isn't covered, we will clarify or modify the rules so it /is/ covered." Development of a standard is primarily a political endeavor, not a technical one. While there /are/ a number of equivalent schemes a standard means that you pick one and stick with it. Frankly, if I ruled the world, that standard would be TEI as it is the most complete of markup languages for text encoding, and being XML is easy to work with. But as a general rule, people's irrational fear of TEI is even greater than their irrational fear of XHTML, so as a practical matter HTML is a better /political/ choice. I don't care if paragraphs are marked with <p> or <para> or {\pard} or two [CR/LF] pairs following a non-whitespace character and terminated by a [CR/LF] pair, just so long as I know that when I encounter that markup I am guaranteed that the text is /always/ a paragraph and /nothing but/ a paragraph. Whatever the consensus is, I will happily adopt it and develop tools for it. But I must have a single rule. This kind of a process will require compromise, and I'm afraid that those who are unwilling to compromise will simply have to be left out of the process. If you don't like my rules, fine, suggest alternatives. We'll go with whatever gets the most support. I don't mind losing so long as at the end of the day everyone gets behind the winner. (For a very interesting exploration of the value of crowd-sourcing, listen to the Radio Lab episode "Emergence" at http://www.radiolab.org/2007/aug/14/).