
On Tue, January 31, 2012 4:37 pm, don kretz wrote:
I completely agree.
Can we also agree that <a class="pgnum" id="pg00nn" title="nn"></a> satisfies all your concerns?
Sure, I've just improved the signal-to-noise ratio. Yours is a) just as unambiguous; b) the same information is in there, but I'd say it's c) less easy for users to apply (and miskey) and a bit more disruptive if you're working on the text.
I think that this conclusion is mostly a matter of opinion and personal preference. Personally, I think that putting meta-parsing content (the page number) in the middle of regular parsing content (the paragraph) /increases/ the noise. Besides, "<a class="pgnum" id="pg00nn">nn</a>" violates my number one rule, which is that every HTML master file must be readable in the absence of CSS. For me it is important that a page number /disappear/ when there are no styles beyond the default HTML styles.
It also biases the conceptual markup, which is what users should think about, toward a specific format - i.e. html - when I like to emphasize the importance of generality. People already too often think of each problem in terms of the html solution to it.
Just for an exercise i loaded one of the newer pg projects into my site. Right away in the title page we had unclassed <h> tags with "style=#..." That kind of stuff doesn't generalize anything.
Go ahead and name names; which file was it? I automatically assume that any element with a 'style' attribute is wrong (I make a few exceptions for things like 'style="page-break-before:always"', but those exceptions are very limited. I also assume that any <style> element in the file is also wrong; but it is easier to deal with.
Another thing that's possible is to make up your own html tags. It's perfectly legit according to the spec. Rather than use <span class="sc">Arthur</span> you can just use <sc>Arthur</sc>. Your browser will let you flavor it with the same css (although older IE needs a one-line prompt in the css file.) If you do the same with <poem> and <footnote>, people start to think a little more syntactically. And if it intimidates you to try such a thing, you can still just regex the <span> stuff back in. But mostly I prefer to use square brackets to emphasize that it's conceptual markup, not html. And the publishing software I'm using makes it easy to install this sort of thing. I have a toolbar with those sorts of tags in the editor, for instance.
That's having the software adapt to what's easiest for the user, rather than vice versa.
I sympathize with the notion, but I'm certain it's not a good idea to mix non-HTML markup into the file (which technically should be namespaced if you're going to do it). Again, my primary objection is that if CSS is not available the "new" markup is visually meaningless. If you really want to use this kind of semantic markup, you'd be better off abandoning HTML altogether and selecting some other XML vocabulary which is explicitly designed to capture this data; something like ... TEI! I'm a big fan of TEI. I only favor HTML for the practical reason that it is now so well established that people are comfortable using it, and that there are dozens if not hundreds of User Agents which are capable of rendering it. (About 5 years ago I produced a TEI .css file that caused TEI documents to render well in standard browsers without modification. I'll put that on the web site with Mr. Gibbens' documents.) I can't see expecting very many people to learn TEI. I /can/ see teaching people to use <i class="foreign"> when they encounter and italicized foreign word rather than using <foreign> directly. It's what they're used to.
Here's a better example. Right now I'm refactoring this markup from an EB article (real example).
<table class="nobctr" style="clear: both;"> <tr><td><img style="width:856px; height:424px" src="/vol12/3/images/img337.jpg" alt="" /></td></tr> <tr><td class="caption"><span class="sc">Fig. 1.</span></td></tr></table>
With conceptual markup, the equivalent (actually improved) result is accomplished (an image that spans 100% of the column) with
Of course it's an improvement because your example violates HTML rule no. 3: Tables should only be used for tabular data, and never merely for formatting. :-)
[illo src="/vol12/3/images/img337.jpg"]Fig. 1.[/illo]
This is cleaner (assuming that "illo" really means "illustration") but absent CSS, XSL or some other sort of pre-processor the illustration wouldn't be presented and the text would just appear in the middle of ... something. I think this illustrates a question that should be answered: should the master format be something that can be consumed without modification or may it always require pre-processing? As near as I can tell, the only thing ReST has in its favor is that it satisfies the condition that it (hopefully) satisfies the white-washers' ITF requirement. If it doesn't, if it also requires pre-processing to make it useful, then I can think of nothing in its favor. [snip]
The DP html files are full of patterns like that, that can be automatically refactored into the generalized equivalent to everyone's benefit.
I believe this to be true, which is why Mr. Hutchinson's proposal is of so much interest. We should be able to grab the Perathoner/Haines/Widger files and automatically convert them to their generalized equivalents fairly quickly, ready for last minute tweaking to a file that can be used to create competent browser/Kindle/epub files programmatically.