Re: [gutvol-d] I'm sorry but I don't get it...

15 Oct 2004

      I'll take your questions in reverse order.
...
why [semantic tagging]
won't horribly lengthen the amount of time required to produce an eBook?
I think a two-part answer is important here.

1. The great news is that basic semantic tagging is roughly the same effort as HTML.  And, if PG had acceptable MASTER-to-text conversion, the overall effort would be REDUCED compared to creating BOTH text and HTML by hand.

Today, creating an eText involves throwing information away, e.g. converting what is clearly multiple levels of heading into ALL CAPS -- which loses any distinction between the levels.  The key to creating a MASTER is to preserve this information.

Sometimes this will require a tiny bit more time (to use the correct tag or add the appropriate attribute) but often it will take less time than manually converting to ALL CAPS or whatever.

And, as I've argued elsewhere, there's no need to wait for widespread agreement on any particular set of XML tags.  If used consistently, it's much, much easier to convert from one XML representation to another than to convert from text to HTML.  In fact, it's also fine to skip XML and just use consistent HTML with appropriate div/span tags and/or attributes on regular HTML tags.  What's important is to stop throwing useful information away and instead to capture it in a way that can be processed automatically.

Takeaway point: reliable MASTER-to-text conversion would increase the number of eTexts produced per unit of volunteer time investment.  (And, as DP folks have argued, additional automation would streamline other stages too.)

2. There's a second level of semantic tagging that *does* require more effort: adding information that's useful but isn't represented in print.  For example, perhaps we want to label every quotation with the name of the speaker.  That's easy in a play, since the name is printed.  That's quite a lot of work in prose since the name may or may not occur adjacent to the quote, and even when it does, could be before or after, and may be represented several ways (e.g. "Arthur", "The King", "His Majesty").

I'm actually a fan of rich semantic markup, but, to be honest, the benefits of this second level are much smaller and the effort much greater.  In the foreseeable future, this is likely only to be done when the volunteer has a specific end use in mind.
...
Could someone please explain the benefit of semantic tagging
Others have addressed this, but I want to summarize and add a few points.

1. A single MASTER copy from which all other versions can be generated automatically.  Plain text and HTML of course, but also PDF and the various eBook formats.  Just as important: more than one rendition of any particular format can be created, e.g. a set of HTML files split by chapter or even page, or PDF formatted for a particular screen size, paper size, or printing layout (e.g. as a booklet).

2. Capture information that's beyond what is generally printed, but is useful to certain audiences and/or in certain contexts.  e.g. (from an earlier thread) the MASTER can capture a mistake AND the correction; or other variations.  See Re[2]: [gutvol-d] Indexing Editors, etc. from Oct. 4, 2004 for details.

3. Automated processes that "add value" in some way, e.g. using a different computer voice for different characters, or creating an index by character.
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/

Re: [gutvol-d] I'm sorry but I don't get it...

Scott Lawton