
On Mon, February 28, 2011 12:46 am, Keith J. Schultz wrote:
Am 27.02.2011 um 18:54 schrieb Marcello Perathoner:
On 02/27/2011 09:34 AM, don kretz wrote:
Certainly not one with the sophistication to enable them to use the breadth of markup required to even edit the poor meagre subset of syntactical information (not even chapters) incorporated into the elegant products coming from DP.
ROTFL. They may be elegant but they are non-functional. They work on desktop-sized screens only, for suitably small values of 'work': Try to narrow your browser window to the typical 5-6 words per line of a mobile phone. Breakage galore! And, yes!, a substantial portion of PG downloads go to mobile phones.
Good Point! BUT, that is because the are not developing towards smaller screen sizes then. WHICH, they they should be doing if targeting the epub format!
I disagree. Production of e-books should be a two-step process. First, the book should be marked up in a semantic way which preserves, to the greatest extent possible, the structure and metadata of the book and which does so in a machine-readable format (markup should be unique, explicit and unambiguous). Then, a computer process should be invoked which can transform the semantic markup into whatever presentation is required. If the person who is doing the initial markup is thinking about how it will look on a mobile phone, he or she is already being confused. The initial markup should focus on document structure and best encoding practices, and let the second, automated step worry about how to convert that markup to a format for a specific device. If Project Gutenberg were to adopt this "single source" strategy not only would its texts be "future-proofed" (compatible with software and devices which have not yet been invented) but it could save a single file for all devices and generate specific output for each device on-the-fly. Errors found and corrected in the master file would immediately be be corrected in all subsequent outputs. This won't happen, of course, but if it did it would be highly beneficial.
HTML is a purely presentational markup and shares all the problems of WYSIWYG and adds some of its own.
I disagree with Mr. Perathoner here. I think HTML started life as /mostly/, but not purely, presentational and has been evolving towards semantic ever since. HTML 4.01/XHTML 1.0 is now mostly /semantic/ and in HTML5 the proposed specification calls for /all/ presentational elements to be isolated into Cascading Style Sheets. When used carefully, it is possible to use HTML4 in a purely semantic way.
It is practically impossible to teach good markup to people that have had a prior exposure to HTML: as potential markup editors they are mentally mutilated beyond hope of regeneration.
WRONG! You have to tell the to forget everything the have learned so far and teach them what good mark-up practice is!
I am of two minds on this subject. I am nowhere near as pessimistic as Mr. Perathoner about the ability of humans to learn new techniques and paradigms. And yet, there is something about this presentation/semantic dichotomy that seems to go much deeper that just training. I am trying (really I am) to try and not get too deeply drawn in to arguments about the superiority of semantic markup. My experience suggests that some people recognize the distinction between semantic markup ("what it is") and presentational markup ("what it looks like") almost immediately, and that the others will almost /never/ get the difference. My current approach (to the degree that I can follow it) is to try and lay out the differences, but not try to convince someone with rational arguments that semantic markup is more useful. Either they get it or they don't, and even cordial and rational discussion doesn't seem to help. [snip]
What is technically possible with HTML, X or otherwise, makes no difference at all unless there's an editor supporting it that is approximately as easy to use as what people write their emails with, and captures syntactic artifacts.
Some good editing tools might be step in the right direction. Note, however, that these tools should /not/ be WYSIWYG, but WYSIWI (What You See Is What Is). These tools should be like braces, forcing the mind into a specific mindset and making the structure of a document explicit and visible. Pressing "Enter" should not implicitly start a new paragraph, but the tool should require that paragraphs be explicitly identified. Whenever a span of text is marked as italic, the tool should bring up a dialog asking /why/ the text should be italicized. Is it emphasized/stressed? Is it a foreign word or phrase? Is it a title? Is it simply intended to be an alternate font face? A good tool would make the user confront these issues at every step of the way, until the understanding is automatic.
Machines cannot capture semantic yet. (And when they do, Google's automatic output will surpass DP's human output not only in quantity but also in quality, thus making DP obsolete.)
DP should have educated their processors about semantic markup. They have failed this in the same way they have soundly slept through the technological changes of the last 5 years. (At least I wasn't able to find a single FAQ about semantic markup at DP and DP's output doesn't look like they are getting it.)
I fear that at DP the number of people who understand the concept of semantic markup are vastly outnumbered by the number of people who do not and cannot, and I fear that those who do not understand the distinction cannot be made to understand it through rational discourse. An alternative might be to put together a group of people who /do/ understand the distinction. It ought to be possible at this point to extract files from DP before they have been degraded for PG use (just before Post-Processing?) and store them in a new repository where they can be Post-Processed by the semantic volunteers. Project Gutenberg may not want these new files, but I'm sure Internet Archive would store them for us. And Mr. Newby might be willing to provide some online storage and a web interface to access them.
Until the average person at DP cannot tell a paragraph from not a paragraph, every discussion about formats and tools is moot.
Besides, what is a Chapter title? It, too is a paragraph, in most cases!
No it is not. You are thinking presentationally, not semantically. A paragraph is "one or more complete sentences, usually devoted to one idea and usually marked by the beginning of a new line, indentation, or increased interlinear space." A chapter is "a division of a written work, especially a narrative, usually titled or numbered." A title is "a descriptive name, caption, or heading of a section of a book." A chapter title is no more a paragraph than the phrase "Keith J. Schultz" is a paragraph, and anyone who hopes to engage in semantic markup must understand this. Cheers, Lee