a review of some digitization tools -- 024

let's discuss, for a minute, how to prepare a .zml file... for fun, we'll contrast it with how to code an .html file. *** we'll start with something basic -- simple italics. let's say that a word is _italicized_ in the p-book. to do that markup in .zml, you surround the word with _underscores._ you can type 'em in manually, or you might have a tool that has a button that you can click to have the tool insert them automatically. in .html, you'd surround the word with italics tags, so an [i]italicized[/i] word looks something like that. (i changed the angle-brackets to straight-brackets, so an .html viewer-tool will not be confused by it.) again, you can type the brackets-plus-tag manually, or you might have a tool to insert 'em automatically. most .html authoring-tools have an [i] or [em] button. so in this case, for italics, there's not much difference between .zml and .html... one uses underscores and the other uses bracket-commands, but -- essentially -- there's no difference between them. i might think that underscores are easier to type, and easier on the eyes, and thus less obstructive when it comes time to _edit_, but it's hard to argue that there's much difference here. and thus it is with a lot of the mark-up you might do... in .zml, you indicate a blockquote with a leading " > ", whereas in .html you use [blockquote]xyz[/blockquote]. and in .zml, you indicate a list item with a leading " * ". while in .html it's more like this: [ul][li]abcdef[/li][ul]. again, i think the .zml looks a lot less obtrusive, and that is indeed part of the big appeal of "zen" markup, but if zen doesn't matter to you, there's no difference. *** however, there _are_ some cases where it does make a difference when you're tagging in .zml versus .html. for instance, take the case of _headers_... on the face, it might appear to be a similar situation. in .zml, you indicate a header by 4 or more blank lines, follow it with exactly 2 lines, and precede it with a space. so, aside from the blank lines above and below it, which "sets it off" such that the header will gain your attention, a header has a pretty ordinary appearance in your text:
chapter 7 header
in .html, you tag a header thus: [h2]chapter 7 header[/h2] so if it was merely a matter of the tagging of the header, this case is exactly like the previous: not much difference. but various other factors come into play involving headers. first, the table-of-contents should _link_ to every header... at a bare minimum, this means your header has to have an "id" that is attached to it, to which the link is directed. so in addition to the _header_ tag, you must make an "id", with another bracket-command, which might look like this:
[div id="chapter_7_header"][/div]
so now the header, with its matching "id" spec, looks like:
[div id="chapter_7_header"][/div] [h2]chapter 7 header[h2]
you could leave it just like that. many people do, actually. but .zml does more -- by default -- to aid in navigation. every .zml header links _back_ to the table-of-contents, so a person who is at any header can _jump_ to the t.o.c. to do the same kind of back-link in .html, you'd do this:
[div id="chapter_7_header"][/div] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
that's a little more work, but very few .html tools do this, so you'd have to do it manually. but it's still manageable. but there's more. .zml also generates automatic links to the _previous_ and _next_ chapters, so a person can easily "skim" from one chapter to the next, which comes in handy. in order to do this in .html, you'd have to do this:
[div id="chapter_7_header"][/div] [a href="#chapter_6_header"]prev[/a] [a href="#chapter_8_header"]next[/a] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
note that this raises the level of complexity quite a bit. the "id" was based on the header itself, so that was easy to generate, because the two are so close to each other. so it would be rather easy to code a reg-ex for the task. but when each header has to know the id of the header which came before it, and the one which comes after it, we have experienced an increase in the difficulty factor, to the point where the reg-ex is gonna get pretty hairy. you will also need some tagging in there to get the links _positioned_ in the exact way that you'll want them, but i won't bother to put in that clutter too right now, as... ...i wish to point out that, all of a sudden, our header has gotten _buried_ inside a lot of .html-tag clutter... which makes it difficult if you need to do more editing. (a rule of thumb is you always need to do more editing.) so again, let's take a look back at the .zml header...
chapter 7 header
very clean. everything is automatic. breath of fresh air. *** we can go through the very same exercise for footnotes. in the body of the text, where you have the footnote,[1] you will need to do the .html coding to create that link. [1] at the footnote itself, which might be in the "footnotes" section, you'll need to form an "id" for the link to jump to. and likewise, it's usually nice to have a _back-link_ too. which means you need to have an "id" in the body-text, and code the back-link in the footnotes section as well. and, seriously, that's more than you wanna do by hand. and thankfully, most of the time, you won't need to... many .html tools make it relatively easy to make a link. you just click where you want the link to be, and then click again where you want that link to jump to. easy... at least, it's easy if you only have a few such links... but if there's 580 footnotes, like a book i did a while back, you _will_ want these steps to be performed automatically. my .zml converter does that for you. *** now, don't get me wrong. there is absolutely _nothing_ in my converter-tool that some programmer could not clone in an .html authoring-tool. nothing at all. zilch. indeed, it wouldn't surprise me if it's already been done. (well, actually, it would. a little bit. but i'd get over it.) for instance, the text-to-html converter inside guiguts, the d.p. postprocessing tool, does good on footnotes... it'd even be possible to build an .html authoring-tool which inserted all of this code automatically and _also_ removed it from the file when you wanted to do editing -- so you wouldn't have all that crud distracting you -- only to re-insert it once again after you were finished... it's even possible this tool could be offered at no cost... it's even possible it could be an open-source creation... you will let us know if you find such a tool, won't you? :+) *** z.m.l. input file:
python script to create .html output:
-bowerbird

Bowerbird, I am interested in trying out your tool. I'm currently working on a book that has a special meaning to me but which is hell to format. If you've seen my earlier emails you know I've more or less committed myself to doing family tree tables using ASCII text. There are a LOT of these in the book. I'd need some way to indicate that these should be surrounded by <pre> tags in the HTML. Second, I need to have some illustrations. How would your tool handle those? My understanding of your ideas is that I can make a file with some simple markup and generate many different formats from that. I used the HTML generator in guiguts for my PG submissions so far. I'm not entirely happy with it, especially the way it uses <span> tags to indicate indented text. I end up going through and replacing these with <blockquotes>. The style sheet guiguts gives me is good for HTML but needs to be tweaked to produce a good Kindle book. And of course once I've generated my HTML I now have two files to correct. Also, if the book has multiple levels of headings, how would you deal with that? James Simmons On Tue, Dec 20, 2011 at 3:23 AM, <Bowerbird@aol.com> wrote:
let's discuss, for a minute, how to prepare a .zml file...
for fun, we'll contrast it with how to code an .html file.
***
we'll start with something basic -- simple italics.
let's say that a word is _italicized_ in the p-book.
to do that markup in .zml, you surround the word with _underscores._ you can type 'em in manually, or you might have a tool that has a button that you can click to have the tool insert them automatically.
in .html, you'd surround the word with italics tags, so an [i]italicized[/i] word looks something like that.
(i changed the angle-brackets to straight-brackets, so an .html viewer-tool will not be confused by it.)
again, you can type the brackets-plus-tag manually, or you might have a tool to insert 'em automatically. most .html authoring-tools have an [i] or [em] button.
so in this case, for italics, there's not much difference between .zml and .html... one uses underscores and the other uses bracket-commands, but -- essentially -- there's no difference between them. i might think that underscores are easier to type, and easier on the eyes, and thus less obstructive when it comes time to _edit_, but it's hard to argue that there's much difference here.
and thus it is with a lot of the mark-up you might do...
in .zml, you indicate a blockquote with a leading " > ", whereas in .html you use [blockquote]xyz[/blockquote].
and in .zml, you indicate a list item with a leading " * ". while in .html it's more like this: [ul][li]abcdef[/li][ul].
again, i think the .zml looks a lot less obtrusive, and that is indeed part of the big appeal of "zen" markup, but if zen doesn't matter to you, there's no difference.
***
however, there _are_ some cases where it does make a difference when you're tagging in .zml versus .html.
for instance, take the case of _headers_...
on the face, it might appear to be a similar situation.
in .zml, you indicate a header by 4 or more blank lines, follow it with exactly 2 lines, and precede it with a space.
so, aside from the blank lines above and below it, which "sets it off" such that the header will gain your attention, a header has a pretty ordinary appearance in your text:
chapter 7 header
in .html, you tag a header thus: [h2]chapter 7 header[/h2]
so if it was merely a matter of the tagging of the header, this case is exactly like the previous: not much difference.
but various other factors come into play involving headers.
first, the table-of-contents should _link_ to every header...
at a bare minimum, this means your header has to have an "id" that is attached to it, to which the link is directed.
so in addition to the _header_ tag, you must make an "id", with another bracket-command, which might look like this:
[div id="chapter_7_header"][/div]
so now the header, with its matching "id" spec, looks like:
[div id="chapter_7_header"][/div] [h2]chapter 7 header[h2]
you could leave it just like that. many people do, actually.
but .zml does more -- by default -- to aid in navigation.
every .zml header links _back_ to the table-of-contents, so a person who is at any header can _jump_ to the t.o.c.
to do the same kind of back-link in .html, you'd do this:
[div id="chapter_7_header"][/div] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
that's a little more work, but very few .html tools do this, so you'd have to do it manually. but it's still manageable.
but there's more. .zml also generates automatic links to the _previous_ and _next_ chapters, so a person can easily "skim" from one chapter to the next, which comes in handy.
in order to do this in .html, you'd have to do this:
[div id="chapter_7_header"][/div] [a href="#chapter_6_header"]prev[/a] [a href="#chapter_8_header"]next[/a] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
note that this raises the level of complexity quite a bit. the "id" was based on the header itself, so that was easy to generate, because the two are so close to each other. so it would be rather easy to code a reg-ex for the task.
but when each header has to know the id of the header which came before it, and the one which comes after it, we have experienced an increase in the difficulty factor, to the point where the reg-ex is gonna get pretty hairy.
you will also need some tagging in there to get the links _positioned_ in the exact way that you'll want them, but i won't bother to put in that clutter too right now, as...
...i wish to point out that, all of a sudden, our header has gotten _buried_ inside a lot of .html-tag clutter... which makes it difficult if you need to do more editing. (a rule of thumb is you always need to do more editing.)
so again, let's take a look back at the .zml header...
chapter 7 header
very clean. everything is automatic. breath of fresh air.
***
we can go through the very same exercise for footnotes.
in the body of the text, where you have the footnote,[1] you will need to do the .html coding to create that link.
[1] at the footnote itself, which might be in the "footnotes" section, you'll need to form an "id" for the link to jump to.
and likewise, it's usually nice to have a _back-link_ too. which means you need to have an "id" in the body-text, and code the back-link in the footnotes section as well.
and, seriously, that's more than you wanna do by hand.
and thankfully, most of the time, you won't need to...
many .html tools make it relatively easy to make a link. you just click where you want the link to be, and then click again where you want that link to jump to. easy...
at least, it's easy if you only have a few such links...
but if there's 580 footnotes, like a book i did a while back, you _will_ want these steps to be performed automatically.
my .zml converter does that for you.
***
now, don't get me wrong. there is absolutely _nothing_ in my converter-tool that some programmer could not clone in an .html authoring-tool. nothing at all. zilch.
indeed, it wouldn't surprise me if it's already been done. (well, actually, it would. a little bit. but i'd get over it.)
for instance, the text-to-html converter inside guiguts, the d.p. postprocessing tool, does good on footnotes...
it'd even be possible to build an .html authoring-tool which inserted all of this code automatically and _also_ removed it from the file when you wanted to do editing -- so you wouldn't have all that crud distracting you -- only to re-insert it once again after you were finished...
it's even possible this tool could be offered at no cost...
it's even possible it could be an open-source creation...
you will let us know if you find such a tool, won't you? :+)
***
z.m.l. input file:
python script to create .html output:
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Tue, December 20, 2011 2:23 am, Bowerbird@aol.com wrote: [snipped long-winded exposition on s.m.l. rules]
so in addition to the _header_ tag, you must make an "id", with another bracket-command, which might look like this:
[div id="chapter_7_header"][/div]
You /could/ use another "bracket command", but if you did so you would use the <a>nchor tag which is explicitly designed to be the target of a link, e.g.: <a id="ch07" />
so now the header, with its matching "id" spec, looks like:
[div id="chapter_7_header"][/div] [h2]chapter 7 header[h2]
Every element in HTML has two means of identification: the "class" attribute, and the "id" attribute. The value of the "id" attribute must be unique in the document; it identifies one specific element. The value of the "class" attribute is shared among all elements of a specific type which share common properties. In this case, rather than adding a new element to hold the target identifier you should simply add it to the header element itself, e.g.: <h2 id="ch07">chapter 7 header</h2>
you could leave it just like that. many people do, actually.
but .zml does more -- by default -- to aid in navigation.
every .zml header links _back_ to the table-of-contents, so a person who is at any header can _jump_ to the t.o.c.
to do the same kind of back-link in .html, you'd do this:
[div id="chapter_7_header"][/div] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
An HTML <a> tag can be used as both the source and the target of a link. To build a back link in HTML, you could do this: <h2 id="ch07"><a href="#toc>chapter 7 header</a></h2> or you could do this: <h2><a id="ch07" href="#toc>chapter 7 header</a></h2>
that's a little more work, but very few .html tools do this, so you'd have to do it manually. but it's still manageable.
but there's more. .zml also generates automatic links to the _previous_ and _next_ chapters, so a person can easily "skim" from one chapter to the next, which comes in handy.
I believe this is inaccurate. So far, I have seen nothing in your explanation of s.m.l. that indicates how this is done in the markup language. I could accept the assertion that "all User Agents which support .zml must implement a mechanism to skim back and forth from and to chapter headings," but that is a function of the User Agent, not the markup language. Indeed, with the ePub 3 specification conformant software is expected to do very much the same thing: parse the table of contents (implemented as a list) and provide a method of navigation to skip from chapter to chapter.
in order to do this in .html, you'd have to do this:
[div id="chapter_7_header"][/div] [a href="#chapter_6_header"]prev[/a] [a href="#chapter_8_header"]next[/a] [a href="#table_of_contents"] [h2]chapter 7 header[h2] [/a]
Close. If you wanted to do this (and I'd just as soon /not/ have this "feature/bug" cluttering up my e-books) you would probably want to do something like this: <div class="navCrumb"> <a href="#ch06">prev</a> | <a href="#toc">up</a> | <a href="#ch08">next</a> </div> <h2 id ="ch07">chapter 7 header</h2> One sees this kind of navigation element a lot out on the web, particularly on long articles or tutorials. I added the surrounding classed <div> element so I could add a style selector: div.navCrumb { display:none } to make them easy to get rid of.
note that this raises the level of complexity quite a bit. the "id" was based on the header itself, so that was easy to generate, because the two are so close to each other. so it would be rather easy to code a reg-ex for the task.
but when each header has to know the id of the header which came before it, and the one which comes after it, we have experienced an increase in the difficulty factor, to the point where the reg-ex is gonna get pretty hairy.
Yeah. It would be easy to write a script to do it, but if your only tool is regex it could be a bit tough. I did manage to write a macro in TextPad that did it...
you will also need some tagging in there to get the links _positioned_ in the exact way that you'll want them, but i won't bother to put in that clutter too right now, as...
Don't bother to do it ever. Positioning of elements is something to be left for style sheets; it shouldn't be part of the HTML.
...i wish to point out that, all of a sudden, our header has gotten _buried_ inside a lot of .html-tag clutter... which makes it difficult if you need to do more editing. (a rule of thumb is you always need to do more editing.)
But when you look at it, there's no mistaking which part is the header, and which is the markup. [snippage]
in the body of the text, where you have the footnote,[1] you will need to do the .html coding to create that link.
[1] at the footnote itself, which might be in the "footnotes" section, you'll need to form an "id" for the link to jump to.
and likewise, it's usually nice to have a _back-link_ too. which means you need to have an "id" in the body-text, and code the back-link in the footnotes section as well.
and, seriously, that's more than you wanna do by hand.
Actually, it's easier than your navigation crumb. At the point in the text where the footnote is marked you put: <a href="#fn01" id="ret01">[1]</a> (you can add <sup> if you want the footnote superscripted) and in the footnote section you put: <a href="#ret01" id="fn01">[1]</a> Most footnotes in text are superscripted, so in fr2html.exe I wrote an automated mechanism to build footnotes from the HTML output so I didn't have to create elements by hand, even as a text construct. [snip]
you will let us know if you find such a tool, won't you? :+)
Somebody will probably mention it. It's too bad you won't be listening.
participants (3)
-
Bowerbird@aol.com
-
James Simmons
-
Lee Passey