[gutvol-d] Re: Real men don't use Semantic Markup

28 Feb 2011

      On Mon, February 28, 2011 12:46 am, Keith J. Schultz wrote:
...
Am 27.02.2011 um 18:54 schrieb Marcello Perathoner:
...
On 02/27/2011 09:34 AM, don kretz wrote:
...
Certainly not one with the sophistication to enable them to use the breadth
of markup required to even edit the poor meagre subset of syntactical
information (not even chapters) incorporated into the elegant products
coming
from DP.
ROTFL. They may be elegant but they are non-functional. They work on
desktop-sized screens only, for suitably small values of 'work': Try to
narrow your browser window to the typical 5-6 words per line of a mobile
phone. Breakage galore! And, yes!, a substantial portion of PG downloads go
to mobile phones.
Good Point!
BUT, that is because the are not developing towards smaller screen sizes
then. WHICH, they they should be doing if targeting the epub format!
I disagree. Production of e-books should be a two-step process. First, the
book should be marked up in a semantic way which preserves, to the greatest
extent possible, the structure and metadata of the book and which does so in a
machine-readable format (markup should be unique, explicit and unambiguous).
Then, a computer process should be invoked which can transform the semantic
markup into whatever presentation is required.

If the person who is doing the initial markup is thinking about how it will
look on a mobile phone, he or she is already being confused. The initial
markup should focus on document structure and best encoding practices, and let
the second, automated step worry about how to convert that markup to a format
for a specific device.

If Project Gutenberg were to adopt this "single source" strategy not only
would its texts be "future-proofed" (compatible with software and devices
which have not yet been invented) but it could save a single file for all
devices and generate specific output for each device on-the-fly. Errors found
and corrected in the master file would immediately be be corrected in all
subsequent outputs.

This won't happen, of course, but if it did it would be highly beneficial.
...
...
HTML is a purely presentational markup and shares all the problems of
WYSIWYG and adds some of its own.
I disagree with Mr. Perathoner here. I think HTML started life as /mostly/,
but not purely, presentational and has been evolving towards semantic ever
since. HTML 4.01/XHTML 1.0 is now mostly /semantic/ and in HTML5 the proposed
specification calls for /all/ presentational elements to be isolated into
Cascading Style Sheets.

When used carefully, it is possible to use HTML4 in a purely semantic way.
...
...
It is practically impossible to teach good markup to people that have had a
prior exposure to HTML: as potential markup editors they are mentally
mutilated beyond hope of regeneration.
WRONG! You have to tell the to forget everything the have learned so far and
teach them what good mark-up practice is!
I am of two minds on this subject. I am nowhere near as pessimistic as Mr.
Perathoner about the ability of humans to learn new techniques and paradigms.
And yet, there is something about this presentation/semantic dichotomy that
seems to go much deeper that just training.

I am trying (really I am) to try and not get too deeply drawn in to arguments
about the superiority of semantic markup. My experience suggests that some
people recognize the distinction between semantic markup ("what it is") and
presentational markup ("what it looks like") almost immediately, and that the
others will almost /never/ get the difference. My current approach (to the
degree that I can follow it) is to try and lay out the differences, but not
try to convince someone with rational arguments that semantic markup is more
useful. Either they get it or they don't, and even cordial and rational
discussion doesn't seem to help.

[snip]
...
...
...
What is technically possible with HTML, X or otherwise, makes no difference
at all unless there's an editor supporting it that is approximately as
easy to use as what people write their emails with, and captures syntactic
artifacts.
Some good editing tools might be step in the right direction. Note, however,
that these tools should /not/ be WYSIWYG, but WYSIWI (What You See Is What
Is). These tools should be like braces, forcing the mind into a specific
mindset and making the structure of a document explicit and visible. Pressing
"Enter" should not implicitly start a new paragraph, but the tool should
require that paragraphs be explicitly identified. Whenever a span of text is
marked as italic, the tool should bring up a dialog asking /why/ the text
should be italicized. Is it emphasized/stressed? Is it a foreign word or
phrase? Is it a title? Is it simply intended to be an alternate font face? A
good tool would make the user confront these issues at every step of the way,
until the understanding is automatic.
...
...
Machines cannot capture semantic yet. (And when they do, Google's automatic
output will surpass DP's human output not only in quantity but also in
quality, thus making DP obsolete.)
DP should have educated their processors about semantic markup. They have
failed this in the same way they have soundly slept through the technological
changes of the last 5 years. (At least I wasn't able to find a single FAQ
about semantic markup at DP and DP's output doesn't look like they are
getting it.)
I fear that at DP the number of people who understand the concept of semantic
markup are vastly outnumbered by the number of people who do not and cannot,
and I fear that those who do not understand the distinction cannot be made to
understand it through rational discourse.

An alternative might be to put together a group of people who /do/ understand
the distinction. It ought to be possible at this point to extract files from
DP before they have been degraded for PG use (just before Post-Processing?)
and store them in a new repository where they can be Post-Processed by the
semantic volunteers. Project Gutenberg may not want these new files, but I'm
sure Internet Archive would store them for us. And Mr. Newby might be willing
to provide some online storage and a web interface to access them.
...
...
Until the average person at DP cannot tell a paragraph from not a paragraph,
every discussion about formats and tools is moot.
Besides, what is a Chapter title? It, too is a paragraph, in most cases!
No it is not. You are thinking presentationally, not semantically. A paragraph
is "one or more complete sentences, usually devoted to one idea and usually
marked by the beginning of a new line, indentation, or increased interlinear
space." A chapter is "a division of a written work, especially a narrative,
usually titled or numbered." A title is "a descriptive name, caption, or
heading of a section of a book."

A chapter title is no more a paragraph than the phrase "Keith J. Schultz" is a
paragraph, and anyone who hopes to engage in semantic markup must understand
this.

Cheers,
Lee

[gutvol-d] Re: Real men don't use Semantic Markup

Lee Passey