Re: [gutvol-d] it's monday, after noon, let's get the crowdsourcing started

1 Feb 2012

      On Tue, January 31, 2012 4:37 pm, don kretz wrote:
...
...
I completely agree.
Can we also agree that <a class="pgnum" id="pg00nn" title="nn"></a>
satisfies all your concerns?
Sure, I've just improved the signal-to-noise ratio. Yours is a) just as
unambiguous; b) the same information is in there, but I'd say it's c) less
easy for users to apply (and miskey) and a bit more disruptive if you're
working on the text.
I think that this conclusion is mostly a matter of opinion and personal
preference. Personally, I think that putting meta-parsing content (the page
number) in the middle of regular parsing content (the paragraph) /increases/
the noise.

Besides, "<a class="pgnum" id="pg00nn">nn</a>" violates my number one rule,
which is that every HTML master file must be readable in the absence of CSS.
For me it is important that a page number /disappear/ when there are no styles
beyond the default HTML styles.
...
It also biases the conceptual markup, which is what users should think
about, toward a specific format - i.e. html - when I like to emphasize the
importance of generality. People already too often think of each problem in
terms of the html solution to it.
Just for an exercise i loaded one of the newer pg projects into my site.
Right away in the title page we had unclassed <h> tags with "style=#..."
That kind of stuff doesn't generalize anything.
Go ahead and name names; which file was it?

I automatically assume that any element with a 'style' attribute is wrong (I
make a few exceptions for things like 'style="page-break-before:always"', but
those exceptions are very limited. I also assume that any <style> element in
the file is also wrong; but it is easier to deal with.
...
Another thing that's possible is to make up your own html tags. It's
perfectly legit according to the spec. Rather than use <span
class="sc">Arthur</span> you can just use <sc>Arthur</sc>. Your browser
will let you flavor it with the same css (although older IE needs a
one-line prompt in the css file.) If you do the same with <poem> and
<footnote>, people start to think a little more syntactically. And if it
intimidates you to try such a thing, you can still just regex the <span>
stuff back in. But mostly I prefer to use square brackets to emphasize that
it's conceptual markup, not html. And the publishing software I'm using
makes it easy to install this sort of thing. I have a toolbar with those
sorts of tags in the editor, for instance.
That's having the software adapt to what's easiest for the user, rather
than vice versa.
I sympathize with the notion, but I'm certain it's not a good idea to mix
non-HTML markup into the file (which technically should be namespaced if
you're going to do it). Again, my primary objection is that if CSS is not
available the "new" markup is visually meaningless.

If you really want to use this kind of semantic markup, you'd be better off
abandoning HTML altogether and selecting some other XML vocabulary which is
explicitly designed to capture this data; something like ... TEI!

I'm a big fan of TEI. I only favor HTML for the practical reason that it is
now so well established that people are comfortable using it, and that there
are dozens if not hundreds of User Agents which are capable of rendering it.
(About 5 years ago I produced a TEI .css file that caused TEI documents to
render well in standard browsers without modification. I'll put that on the
web site with Mr. Gibbens' documents.)

I can't see expecting very many people to learn TEI. I /can/ see teaching
people to use <i class="foreign"> when they encounter and italicized foreign
word rather than using <foreign> directly. It's what they're used to.
...
Here's a better example. Right now I'm refactoring this markup from an EB
article (real example).
<table class="nobctr" style="clear: both;">
<tr><td><img style="width:856px; height:424px"
src="/vol12/3/images/img337.jpg" alt="" /></td></tr>
<tr><td class="caption"><span class="sc">Fig. 1.</span></td></tr></table>
With conceptual markup, the equivalent (actually improved) result is
accomplished (an image that spans 100% of the column) with
Of course it's an improvement because your example violates HTML rule no. 3:
Tables should only be used for tabular data, and never merely for formatting.
:-)
...
[illo src="/vol12/3/images/img337.jpg"]Fig. 1.[/illo]
This is cleaner (assuming that "illo" really means "illustration") but absent
CSS, XSL or some other sort of pre-processor the illustration wouldn't be
presented and the text would just appear in the middle of ... something.

I think this illustrates a question that should be answered: should the master
format be something that can be consumed without modification or may it always
require pre-processing? As near as I can tell, the only thing ReST has in its
favor is that it satisfies the condition that it (hopefully) satisfies the
white-washers' ITF requirement. If it doesn't, if it also requires
pre-processing to make it useful, then I can think of nothing in its favor.

[snip]
...
The DP html files are full of patterns like that, that can be automatically
refactored into the generalized equivalent to everyone's benefit.
I believe this to be true, which is why Mr. Hutchinson's proposal is of so
much interest. We should be able to grab the Perathoner/Haines/Widger files
and automatically convert them to their generalized equivalents fairly
quickly, ready for last minute tweaking to a file that can be used to create
competent browser/Kindle/epub files programmatically.

Re: [gutvol-d] it's monday, after noon, let's get the crowdsourcing started

Lee Passey