
my versions of "alice in wonderland" are up now. the z.m.l. text-file is at:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml
a representative .html file:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html
i believe the .html version is as good as any you'll find on the net. (there are a _lot_ of .html versions of alice all around cyberspace. which is why i was amazed project gutenberg doesn't have one...) but please do feel free to give feedback about what might be wrong; i assure you i can take it. (even if i don't happen to agree with it...) because i used only a bare minimum of tags (appended to this post), i expect that this .html version would be fine for generating versions for the rocketbook, ereader, mobipocket, and the other p.d.a programs. so the question now is, if we can generate an .html version, as well as all the other versions stemming from the plain-text file, why do x.m.l.? david moynihan over at blackmask.com proved that _one_person_ could keep up with all the files being posted, converting each text-file (which he had to rework slightly) into a half-dozen different formats. in other words, the _promise_ that x.m.l. will give different versions -- which is, as yet, unproven, i must remind you -- has already been _realized_ -- for 15,000+ e-texts -- using the _plain-text_ version! so take a good hard look at the plain-text version i have just posted, and then take a good hard look at the various x.m.l. versions around, and decide for yourself which version _you_ would rather maintain... -bowerbird p.s. tags used in my .html file: [p][/p] [p id="..."][/p] [a href="#..."][/a] [br /] [hr /] [img src] [i][/i] [pre][/pre] [center][/center] [ul][/ul] [h1][/h1] [h2][/h2] [h3][/h3] [h4][/h4] [small][/small] [html][/html] [head][/head] [title][/title] [style][/style] [body][/body]

Bowerbird@aol.com wrote:
my versions of "alice in wonderland" are up now.
a representative .html file:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html
Result: Failed validation, 159 errors http://validator.w3.org/check?uri=http%3A%2F%2Fsnowy.arsc.alaska.edu%2Fbowerbird%2Falice01%2Falice01%2Falice01.html&charset=%28detect+automatically%29&doctype=Inline&verbose=1 -- Marcello Perathoner webmaster@gutenberg.org

Bowerbird@aol.com wrote:
i believe the .html version is as good as any you'll find on the net.
Result: Failed validation, 159 errors
but please do feel free to give feedback about what might be wrong; i assure you i can take it. (even if i don't happen to agree with it...)
You may tell a piece of your mind to the w3c validator team starting here: http://validator.w3.org/feedback.html
because i used only a bare minimum of tags (appended to this post), i expect that this .html version would be fine for generating versions for the rocketbook, ereader, mobipocket, and the other p.d.a programs.
First of all, it should validate. Second, it should not use deprecated tags. Third, it should use css.
so the question now is, if we can generate an .html version, as well as all the other versions stemming from the plain-text file, why do x.m.l.?
Lets mercifully overlook the fact that you have stolen my XXXXXXXXXXXXXXXXXXXX you did not take the `raw' Alice from gutenberg.org as basis for your zml file, but you took the TXT file which was generated from a PGTEI master by the PGTEI converter ... (If you want to prove that you can make better cars than your competitor, you should not buy your competitor's car, paint over your competitor's logo, and sell it as your own make. People will notice!) Lets also overlook the fact that your `generated' html file contains 159 errors, so we really cannot speak of an `html' file at all ... After all this overlooking, you still have just posted two files. You have not demonstrated that the one file was algorithmically derived from the other. To do this you would have to post the source code (or at least a working executable) of your zml converter for us to see. Until you do that, *your* claim is unproven.
in other words, the _promise_ that x.m.l. will give different versions -- which is, as yet, unproven, i must remind you --
The PGTEI claim is proven enough. Texts have been posted. The source code is available. An online converter service is running for everybody to look-see. -- Marcello Perathoner webmaster@gutenberg.org

Marcello wrote:
Bowerbird@aol.com wrote:
i believe the .html version is as good as any you'll find on the net.
It depends upon what you define by "good".
Result: Failed validation, 159 errors
Yes, I got the same result at W3C's validator. Looking at the source of bowerbird's HTML file, it is a good start, but still has problems, which I assume his application can be tweaked to fix. Here is the top portion of his Alice document (I've made the line breaks in the document conform with Windows/Linux -- I assume his application outputs in a Mac form -- his *.txt is that way and opens up in my generic 'vi' editor in a strange form, as it will for Michael Hart and his text editor -- it is essentially how line breaks are represented in text documents.) I've added line numbers to each line (in '[...]') for reference. I've even fixed a couple things which I won't bother to detail to make the following clearer to read: ************************************************************************ [001] <html><head><title>Alice's Adventures in Wonderland [002] </title> [003] </head> [004] <style type="text/css"> [005] <!- [006] body { margin-left: 0%; margin-right: 0%; } [007] [008] p {margin-right: 5%; text-indent: 1.5em; text-align: justify; [009] margin-top: .3em; [010] margin-bottom: .3em; } [011] pre {font: 12pt/16pt georgia; color: #004444; } [012] // -> [013] </style> [014] <body><ul> [015] <p id="Alice'sAdventuresinWonderland"> [016] <a href="#tableofillust [017] rations"><</a> [018] <a href="#TableofContents">c</a> [019] <a href="#TableofContents">></a> [020] <center> [021] <a href="#TableofContents"><h1>Alice's Adventures<br />in Wonderland</ [022] h1></a> [023] </p><p> [024] </p><p> [025] <h2>by Lewis Carroll</h2> [026] </p><p> [027] </p><p> [028] <h3>Illustrated by John Tenniel</h3> [029] i</p><p> [030] </p><p> [031] <h4>the z.m.l. edition -- 2005</h4> [032] </p><p> [033] </p><p> [034] </p><p> [035] </p><p> [036] </p><p> [037] </center> [038] </p><hr /><p id="TableofContents"> [039] <a href="#Alice'sAdventuresinWonderland""><</a> [040] <a href="#Alice'sAdventuresinWonderland">c</a> ****************************************************************************** My comments on this snippet: Line 004: <style> must be placed inside the <head> section. Line 005: The "<!-" appears to be a comment portion, but it has to be "<!--" followed later by a closing "-->" It's amazing that the browsers in question overlook this, probably because it is in a <style> section. It is not needed at all within <style> (the practice of embedding a CSS style sheet within a comment or CDATA section is vastly overblown and only needed in the very rare case that CSS includes the reserved characters of < and >, which are normally not used in CSS.) Line 014: dubious use of <ul>. <ul> was not intended for this purpose. A problem occurs when <ul> (unclassed) is used later in the document for a real list. CSS styling will affect the title page, and probably in unwanted ways. I think this <ul> is not properly closed with a </ul> according to the W3C validator, but not sure. Line 016-019: It is much better to simply make this all: <a href="#tableofContents"><c></a> Line 017: It is best when "<" and ">" are used literally, to use < and >. That sometimes it works in some browsers does not mean that all browsers and processors will work properly with it (incl. text-to-speech engines). Even in the earliest days of HTML, it was stressed when "<" and ">" are used literally, to use < and > to represent them to avoid any and all ambiguities. Line 021: The <a> tag, even in early versions of HTML, should be inside block level tags such as <h1>. <a> is an inline tag, and inline tags appear inside block level tags like <h1>, not on the outside. Line 023: The biggest sin: using <p></p> to effect spacing between groups of content. If one goes and styles <p> differently for actual paragraphs, this will feedback to the spacing between these content groups (many PDAs, when converting HTML to their format, apply their own styles to <p>.) This is NOT structural markup, but rather, a use of content (here empty content) for forcing visual layout (it's similar to using to indent paragraphs, for example.) It may also have a negative impact on some text-to-speech engines which are not visually-based, and thus this "forcing content" means nothing to them. And <br /> is also misused/abused a lot. <br/> oftentimes leads to unwanted results when the text is viewed in PDAs by forcing presentation that PDAs can't gracefully render. We want the text to be as reflowable as possible, and <br/> usually works against this goal. ***** I suggest, as Marcello noted, to run the URL of your document through http://validator.w3.org/ and then follow its suggestions to fix the document. Some of the errors caused other errors, so fix the few obvious ones in the output engine of your application, and the errors for the whole document should clean up fast.
because i used only a bare minimum of tags (appended to this post), i expect that this .html version would be fine for generating versions for the rocketbook, ereader, mobipocket, and the other p.d.a programs.
Not necessarily. The use of <p></p> to effect spacing between groups of content may cause problems in rendering in particular PDA applications, which is difficult to overcome without stripping away those tags. For example, instead of using <p></p> in front of <h1> (and <h2> etc.) use CSS to add space above the <h1>. In the absence of CSS, most browsers automatically add more space above <h1>, <h2>, etc., anyway. Example. Instead of: <p>This is the last paragraph of the section.</p> <p></p> <p></p> <h1>Title to next section</h1> Just use: <p>This is the last paragraph of the section.</p> <h1>Title to next section</h1> No-CSS browsers will render the above appropriately, and CSS can be used in CSS-aware browsers to further tweak the spacing to anything one desires. (and use different spacing depending upon if <h1>, <h2>, etc.)
First of all, it should validate.
Well, it didn't, even for the barebones HTML 3 setting at W3.org. It is also not well-formed XML, which it should be (it can still be HTML 3.2 conformant and be well-formed XML as well.) One reason for XML well-formedness is that it helps assure browsers will treat it more consistently. Plus, the long-term goal with having all web documents well-formed is that it will *greatly* simplify the codebase of web browsers. Right now huge chunks of web browser codebases have to be devoted to figure out what the markup should be when it is not well-formed. Well-formedness also forces web authors to make sure their markup is not ambiguous, which benefits *authors* who cannot test their document in every browser on every OS platform on every hardware device.
Second, it should not use deprecated tags.
Great! Then don't use the <center> tag. Use CSS instead.
Third, it should use css.
Definitely! So, don't use the <center> tag. That is a presentational tag which CSS can handle. And, instead of <i> and <b>, use <em> and <strong>. In effect, <i> and <b> are deprecated, since XHTML 2.0 plans to remove support for them. Also, XHTML 2.0 plans to remove support for the <br /> tag, to be replaced by the inline <l> (line) tag. So <br/> should be used as little as possible, and only when necessary. Properly structuring documents usually avoids many uses of <br/>. The upcoming <l> tag will remove all the remaining need for <br/>.
so the question now is, if we can generate an .html version, as well as all the other versions stemming from the plain-text file, why do x.m.l.?
ZML is essentially logically equivalent to XML using a quite constrained vocabulary to express document structure and content semantics. That is, all schemes to machine communicate document structure, such as ZML and various XML-based vocabularies, are logically equivalent even if implemented in wildly different ways (e.g., normalized plain text versus XML.) Thus, the real question here is not ZML vs. XML, but rather: is the list of structures/semantics which ZML is able to model sufficient for the many purposes, and future purposes, which PG works will be used for? Bowerbird answers yes. Some of us answer no. No one will argue against using the simplest vocabulary in order to meet various needs and requirements. The question is if the XML-equivalent vocabulary represented by ZML (as just noted, there are XML-based analogues to ZML) is sufficient for PG's needs. If Bowerbird will publish his latest rendition of ZML (I've assumed he has tweaked it some since last published), the XHTML and TEI analogues can be determined. From that, it will be easy for everyone to see what is missing from ZML when compared, say, to TEI, and then determine if the many users of PG texts can get by without the "missing" stuff. (Reminds me of Garrison Keillor and his Prairie Home Companion radio show, where in the fictional rural town of Lake Wobegon, its grocery store, "Ralph's Pretty Good Grocery" has the motto: "If you can't get it at Ralph's, you can probably get along without it". <smile/>) Marcello wrote:
The PGTEI claim is proven enough. Texts have been posted. The source code is available. An online converter service is running for everybody to look-see.
Amen. Bowerbird continues to refuse to post his source code for evaluation by the group, and to submit it for open source development to benefit everybody, and to allow others *to help him*. I'm sure some of the folk at the Markdown list he is a member of will gladly help him improve his application only if he asks them and shares his current source code with them. I continue to find his refusal to be open with respect to ZML and his 'application" to be perplexing, short-sighted, and a seeming violation of his own principles. Demonstration certainly helps, but it doesn't answer all questions -- when buying a car, one *does* have to look underneath the hood, and not only trust outside appearance and a driving test. Or, in the pudding analogy: Certainly the pudding has to taste good, but with respect to PG/DP, which is in the *business* of making good tasting pudding, there's many more requirements to the pudding production process than just how the pudding tastes to the customer. (And contrary to what he will likely claim, he has NOT published his ZML format in sufficient detail so that it can be evaluated.) To each his own... Jon
participants (3)
-
Bowerbird@aol.com
-
Jon Noring
-
Marcello Perathoner