Final PGTEI run-thru for a while...

This e-mail concludes the "common" items I want to check in PGTEI. With the items in my previous tests and the ones today, I could mark up 95%+ of what I see in DP. So, my next step is to start trying to understand how the transforms work and see what *I* can do to improve things. Expect this to be slow going, folks, I'm an old English major who likes computers, so I have to puzzle things out as I go! :) **** For this one, I create a hodge-podge of stuff from the beginning of The Hunting of the Snark and my own gibberish additions to get to all the features I wanted to test. The XML file is attached at the end for anyone wanting to reproduce my tests. As before, the conversion was done with Marcello's online transforms at http://www.gutenberg.org/tei/services/tei-online. **** Table markup: 1 - XML was very straight forward. It is similar to HTML table markup, just with slightly different tag words. <row> instead of <tr>. <cell> instead of <td>. All in all, the more human friendly tags in the XML are easier to parse than the HTML. Under the HTML conversion, the tables came out well. No complaints there. Under the TEXT conversion, the small table came out well. However, when I used longer data items in the second table, the TEXT conversion did not do so well. Basically, the text conversion does not try to line wrap the table cells at all, the table grew to be extremely wide. This one is a bit of a show stopper as far as automated conversion is concerned. Granted, the tables could be manually edited, but that hurts the whole reason for using a master document format. ** Footnote markup: Again, no real complains on the footnotes/endnotes. It is pretty straight forward once you read the formatting rules. The nice thing is that the conversion process handles moving the notes to their proper location for you. However, I did have one question that I couldn't find the answer to. How would you handle sidenotes? It looked like you could put a place="left" (or "right") in the <note> tag, but PGTEI doesn't support that. Is that even the right semantic tag for a sidenote? TEXT conversion had one glitch. For some reason, the footnote listing at the end of the text did not put a number 1 in front of the first footnote. The second footnote was labelled with a 2 correctly. This problem was not present in the HTML conversion. ** Page number markup: No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern. ** Blockquotes: I wanted to markup a blockquote example, but I didn't see how. Anyone out there know how to handle a blockquote with a text? ** Poetry markup: I had the most notes for poetry, so I left it for last in the markup. 1 - How should we markup poetry indents? In HTML, I use toput two spaces for indents on the text.... *edit* I just found in Marcello's guide that he suggests using as a quad indent. Works for me, unless someone has a different suggestion. 2 - It was unclear to me at first, that a poetry fragment still needed <lg> around it. <l> which marks off one line of poetry is insufficient, because the poem line would still be treated as inline in the sentence with just <l>. Putting <lg> around it set it off on its own line. 3 - If I understand the markup right, <lg> represents a portion of the poem, such as a single stanza. To represent the whole poem in one structural element, you need a higher level tag. Would <div1> work ok here? Or is the some poem tag I'm missing? HTML results - Poetry is not marked off well. The poems are flush with the left margin. Adding a larger margin around the poem will help it appear distinct from the prose text around it. Also, the paragraph indenting is affected by poetry. Since the conversion only indents a paragraph if the previous line was the end of a paragraph, it doesn't indent after a poem. This is taken care of if we revert to standard HTML paragraph spacing. **** Josh **** source.xml-- ============ <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE TEI.2 SYSTEM "pgtei.dtd"> <TEI.2 lang="en-gb"> <teiHeader> <fileDesc> <titleStmt> <title>The Hunting of the Snark</title> <author><name>Lewis Carroll</name></author> </titleStmt> <editionStmt> <edition n="12">Edition 12 <date value="1992-3">March 1992</date> </edition> </editionStmt> <publicationStmt> <publisher>Project Gutenberg</publisher> <pubPlace><xref url="www.gutenberg.org">www.gutenberg.org</xref></pubPlace> <date value="1992-3">March 1992</date> <idno type='etext-file'>snark12</idno> <availability> <p>This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included online at <xref url="www.gutenberg.org/license">www.gutenberg.org/license</xref></p> </availability> </publicationStmt> <sourceDesc> <bibl> THE MILLENNIUM FULCRUM EDITION 1.2 </bibl> </sourceDesc> </fileDesc> <encodingDesc> <classDecl> <taxonomy id="lc"> <bibl> <title>Library of Congress Classification</title> </bibl> </taxonomy> </classDecl> </encodingDesc> <profileDesc> <langUsage> <language id="en-gb">British</language> </langUsage> <textClass> <classCode scheme="lc"> *** <!-- LoC Class (PR, PQ, ...) --> </classCode> <keywords> <list> <!-- <item>***</item> any keywords for PG search engine --> </list> </keywords> </textClass> </profileDesc> <revisionDesc> <change> <date value="1992-3">March 1992</date> <respStmt> <name>unknown</name> <!-- email: *** --> </respStmt> <item>Project Gutenberg Edition</item> </change> <change> <date value="2004-10">October 2004</date> <respStmt> <name>Joshua Hutchinson</name> <!-- your email --> </respStmt> <item>TEI markup</item> </change> </revisionDesc> </teiHeader> <text> <front> <divGen type="titlepage" /> <divGen type="pgheader" rend="newpage" /> <divGen type="toc" rend="newdoublepage" /> </front> <body> <div> <index index="toc" /> <index index="pdf" /> <index index="pdb" /> <head> THE HUNTING OF THE SNARK </head> <head type="sub">an Agony in Eight Fits</head> <head type="sub"> Lewis Carroll </head> <head type="sub"> THE MILLENNIUM FULCRUM EDITION 1.2 </head> </div> <div rend="newpage" type="preface"> <index index="toc" /> <index index="pdf" /> <index index="pdb" /> <pb n="i" /> <head> PREFACE </head> <p> If—and the thing is wildly possible—the charge of writing nonsense were ever brought against the author of this brief but instructive poem, it would be based, I feel convinced, on the line (in p.4) </p> <lg><l>"Then the bowsprit got mixed with the rudder sometimes." <note place="foot"> This is an example footnote. </note> </l></lg> <p> In view of this painful possibility, I will not (as I might) appeal indignantly to my other writings as a proof that I am incapable of such a deed: I will not (as I might) point to the strong moral purpose of this poem itself, to the arithmetical principles so cautiously inculcated in it, or to its noble teachings in Natural History— I will take the more prosaic course of simply explaining how it happened. </p> <p> The Bellman, who was almost morbidly sensitive about appearances, used to have the bowsprit unshipped once or twice a week to be revarnished, and it more than once happened, when the time came for replacing it, that no one on board could remember which end of the ship it belonged to. They knew it was not of the slightest use to appeal to the Bellman about it — he would only refer to his Naval Code, and read out in pathetic tones Admiralty Instructions which none of them had ever been able to understand — so it generally ended in its being fastened on, anyhow, across the rudder. The helmsman used to stand by with tears in his eyes; he knew it was all wrong, but alas! Rule 42 of the Code, "No one shall speak to the Man at the Helm," had been completed by the Bellman himself with the words "and the Man at the Helm shall speak to no one." So remonstrance was impossible, and no steering could be done till the next varnishing day. During these bewildering intervals the ship usually sailed backwards. </p> <p> As this poem is to some extent connected with the lay of the Jabberwock, let me take this opportunity of answering a question that has often been asked me, how to pronounce "slithy toves." The "i" in "slithy" is long, as in "writhe"; and "toves" is pronounced so as to rhyme with "groves." Again, the first "o" in "borogoves" is pronounced like the "o" in "borrow." I have heard people try to give it the sound of the "o" in "worry. Such is Human Perversity. </p> <p> This also seems a fitting occasion to notice the other hard works in that poem. Humpty-Dumpty's theory, of two meanings packed into one word like a portmanteau, seems to me the right explanation for all. </p> <pb n="ii" /> <p> For instance, take the two words "fuming" and "furious." Make up your mind that you will say both words, but leave it unsettled which you will say first. Now open your mouth and speak. If your thoughts incline ever so little towards "fuming," you will say "fuming-furious;" if they turn, by even a hair's breadth, towards "furious," you will say "furious-fuming;" but if you have the rarest of gifts, a perfectly balanced mind, you will say "frumious."</p> <p> Supposing that, when Pistol uttered the well-known words — </p> <lg><l> "Under which king, Bezonian? Speak or die!" <note place="foot"> <p>This is, hopefully, an example of a multi-line footnote.</p> <p>Here is where the second line of the footnote should be.</p> </note> </l></lg> <p> Justice Shallow had felt certain that it was either William or Richard, but had not been able to settle which, so that he could not possibly say either name before the other, can it be doubted that, rather than die, he would have gasped out "Rilchiam!" </p> </div> <div rend="newdoublepage"> <pb n="1" /> <index index="toc" /> <index index="pdf" /> <index index="pdb" /> <head> Fit the First </head> <head type="sub"> THE LANDING </head> <lg> <l>"Just the place for a Snark!" the Bellman cried,</l> <l n="2">As he landed his crew with care;</l> <l>Supporting each man on the top of the tide</l> <l>By a finger entwined in his hair. </l> </lg> <lg> <l> "Just the place for a Snark! I have said it twice:</l> <l>That alone should encourage the crew. </l> <l>Just the place for a Snark! I have said it thrice: </l> <l>What i tell you three times is true."</l> </lg> <lg> <l>The crew was complete: it included a Boots — </l> <l>A maker of Bonnets and Hoods — </l> <l>A Barrister, brought to arrange their disputes — </l> <l>And a Broker, to value their goods. </l> </lg> </div> <div rend="newpage"> <pb n="2" /> <index index="toc" /> <index index="pdf" /> <index index="pdb" /> <head>Example of a Table</head> <table rows="2" cols="2"> <row role="label"> <cell>Column 1 Heading</cell><cell>Column 2 Heading</cell> </row> <row role="data"> <cell>Column 1 Data</cell><cell>Column 2 Data</cell> </row> </table> <table rows="2" cols="2"> <row role="label"> <cell>Column 1 Heading - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell><cell>Column 2 Heading - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell> </row> <row role="data"> <cell>Column 1 Data - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell><cell>Column 2 Data - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell> </row> </table> </div> <div> <index index="toc" /> <index index="pdf" /> <index index="pdb" /> <head> THE END</head> <p> </p> </div> </body> <back rend="newdoublepage"> <divGen type="footnotes" /> <divGen type="colophon" rend="newpage" /> <divGen type="pgfooter" rend="newpage" /> </back> </text> </TEI.2>

Joshua wrote:
This e-mail concludes the "common" items I want to check in PGTEI.
...
1 - How should we markup poetry indents? In HTML, I use toput two spaces for indents on the text.... *edit* I just found in Marcello's guide that he suggests using as a quad indent. Works for me, unless someone has a different suggestion.
As we've discussed (and argued) before, it is my belief that, except where typography is integral to the poem itself ("poetry as visual art"), that poetry should be marked up in a structural, not presentational, sense. This means text characters should NEVER be used for visual layout purposes -- characters should be used only for representing textual content. Using text characters for layout mucks up usability, repurposeability, CSS styling, and accessibility. Use XSL*, CSS or other styling language to effect the desired output. End-users will now have more ability to tailor the verse to their particular reading devices. Of course, a non-parsed comment could be added to the markup explaining how the original was typeset for those wishing to try to duplicate the original layout (but then, that's one purpose for having access to the original page scans.) Why some here are so enamored with needlessly duplicating the layout of verse in markup is beyond me -- especially when the original page scans are now preserved. I see no one here saying if the original text had indented paragraphs, that we must use a tab or spaces at the start of each paragraph in markup to duplicate that. Wherever the typography is used to help the end-user identify the structure of the poem, that is automatically amenable to structural markup (even if it has to be customized for some really weird poem.) Only when the typography *is* the poem itself does one resort to presentational markup, and here SVG makes the most sense. In a project I'm working on, the 1001 Arabian Nights by Sir Richard F. Burton, there are literally thousands of "quatrains" spread throughout the work. Burton, or the typesetter, chose to present these quatrains in an unusual way, no doubt simply to save paper since the following format makes each quatrain much more compact, and with thousands of quatrains in 6000+ pages, this could mean a lot fewer pages and substantially lower printing costs. Here's an example of how a quatrain is typeset in the source: The blear-eyed scapes the pits * Wherein the lynx-eyed fall: A word the wise man slays * And saves the natural: The Moslem fails of food * The Kafir feasts in hall: What art or act is man's? * God's will obligeth all! It is clear that the layout used in this example has nothing to do with the quatrain itself (the original being Arabic and very likely formatted in a totally different way.) In XHTML, here's how I have chosen to structure it (as you see, the '*' character seen above is not reproduced since it's purpose in the original is for typographic layout only -- it is not part of the content of the verse, just as page numbers are not part of the content of a work): <div class="quatrain" id="q1234"> <p class="verse1">The blear-eyed scapes the pits</p> <p class="verse2">Wherein the lynx-eyed fall:</p> <p class="verse1">A word the wise man slays</p> <p class="verse2">And saves the natural:</p> <p class="verse1">The Moslem fails of food</p> <p class="verse2">The Kafir feasts in hall:</p> <p class="verse1">What art or act is man's?</p> <p class="verse2">God's will obligeth all!</p> </div> With XSLT, if I wanted, the above could be transformed into the original format Burton used in print, or it could be output in the more traditional ABABABAB form of most 19th century Western poetry, with no loss in comprehension of the quatrain itself. There is nothing sacred about the typographic layout of *most* poetry I've seen, pretty as it might be in the printed source -- it simply extends the various typographic conventions used for ordinary prose to aid in understanding the "voiceability" of the verse and how the verses relate to each other. Only when we get to the "poetry as visual art" craze we see a lot in 20th century poetry (and as a few have noted, in older works) that we need to preserve the exact layout. As just noted, SVG is certainly intriguing to do this layout preservation. (This is not the only possible markup scheme, but works for my purposes. I suggest PG study a more generalized structural markup scheme for verse -- study maybe 100 random works containing verse and see if for at least 90 of them some sort of general markup scheme can be developed which, when converted to XHTML, allows a single CSS style sheet to reasonably display the poetry as originally typeset. It would not surprise me if such a 90% generalized markup scheme is possible: a sort of "Poetry Markup Language" -- the other 10% would be covered by customized extensions, and for "poetry as visual art" by SVG.) Jon Noring

Blockquotes:
I wanted to markup a blockquote example, but I didn't see how. Anyone out there know how to handle a blockquote with a text?
Perhaps: <div rend="display"> <q rend="display"> (not very intuitive, eh?) It's mentioned briefly in Marcello's docs, once you know what to look for ("wider margins"). Also, search for these in Marcello's alice.tei and lmiss.tei examples. The "q" one will add quote marks, unless supressed via the appropriate attribute. I couldn't find it in the TEI Lite docs (though I assume it's there somewhere). I did find it in Section 4.3 of "Bare Bones TEI" http://www.tei-c.org/Vault/Bare/ -- which also suggests that rend="block" is equivalent. (I didn't find independent confirmation.) -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/

Page number markup:
No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern.
I didn't see an explicit way to mark the original page numbers. Perhaps as a marginal note? <note place="margin">27</note> -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/

Scott Lawton <scott_bulkmail@productarchitect.com> writes:
Page number markup:
No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern.
I didn't see an explicit way to mark the original page numbers. Perhaps as a marginal note?
<note place="margin">27</note> --
Page numbers are put in the `pb' pagebreak element. <pb n="27" /> ,----[ TEI Manual: 6.9.3 Milestone Tags ] | - <pb> marks the boundary between one page of a text and the next | in a standard reference system. | | `ed' (edition) indicates the edition or version in which the page | break is located at this point | | - <lb> marks the start of a new (typographic) line in some edition | or version of a text. | | `ed' (edition) indicates the edition or version in which the line | break is located at this point | | - <cb> marks the boundary between one column of a text and the next | in a standard reference system. | | `ed' (edition) indicates the edition or version in which the column | break is located at this point `---- There is no need for a `place' attribute, you can use rend="margin" instead. But this is confusing because this it's saying that the page breaks in the original edition were in the margin. And if so, which margin, left or right? Presentational markup should be used to indicate how the original was marked up. Instructions for how something should be displayed should be done using CSS or XSLT. I'm using a EETS edition of The Merlin as a development text because it has a running analysis in the left margin, footnotes, and indicates the page breaks in the original manuscript. So in the electronic edition I need to indicate two different sets of page breaks, one for the original manuscript and another for the page breaks in the EETS edition. This can easily be done using the edition `ed' attribute. <pb ed="ms" n="27" /> <pb ed="EETS" n="53" /> Learning TEI is like learning Emacs or Unix like systems. It's a gradual process of incremental epiphanies. TEI is a large and complex spec and takes some time to digest. More than once over the past couple of years I have quickly looked up something in TEI and thought that it was silly and then came up with my own alternate solution. However, most of the time, after putting my hack into practice I found it didn't work as I expect and finally understood why TEI had had done things the way they had. I've come to respect TEI more and more as a mature body of experience which I am trusting more and more. If something seems stupid or awkward I now try to stop and step back and assume that there is a good chance I don't understand the design before trying to cobble to together my own solution. Detractors of XML on this list have brought up the fact that the TEI manual is 1400 pages long as a negative. Why? This shows that TEI is well documented. As a general rule, the more documentation that is available for a spec the more mature and useful the standard and the easier it is to learn and implement. I remember a sig file from someone back in the early 90's that went something like, "documentation is a sign of failure". This is somewhat true for simple end-user applications, but it certainly isn't true for things like computer languages and markup languages. b/ -- Brad Collins <brad@chenla.org>, Bangkok, Thailand

Presentational markup should be used to indicate how the original was marked up.
Aha! That wasn't clear to me since I've been approaching TEI as a "master" format, whereas it was really designed to describe existing texts (which is fine; that's also something I hope is part of PG's XML solution).
Instructions for how something should be displayed should be done using CSS or XSLT.
Agreed. (Though I include all transformation methods here, not just XSLT.)
I've come to respect TEI more and more as a mature body of experience which I am trusting more and more. If something seems stupid or awkward I now try to stop and step back and assume that there is a good chance I don't understand the design before trying to cobble to together my own solution.
I think that's a good approach with things like TEI, XHTML, etc. A bunch of very smart people spent quite a bit of time on them. Three caveats: 1. there are still aspects that are *truly* awkward, e.g. rend="display" to indent (though I welcome a good explanation) 2. the design goals for TEI (or any other particular solution) may not match PG's design goals 3. different people work differently, so there's often no one "best" answer (e.g. some people love XSLT, some hate it) -- Cheers, Scott S. Lawton http://Classicosm.com/ - Classic Books http://ProductArchitect.com/ - consulting
participants (4)
-
Brad Collins
-
Jon Noring
-
Joshua Hutchinson
-
Scott Lawton