re: [gutvol-d] differentiating between indentation and no indentation after p...

lee said:
The problem is that historically Project Gutenberg has been considered a NMA (No Markup Allowed) zone.
just because _you_ can't "see" the markup, lee, doesn't mean that it isn't there... one or more leading-spaces in a line is a signal that the line should not be wrapped to the line above it... as you are someone who worked on "tidy", i'd have expected that you'd be familiar with this convention, since -- to the best of my knowledge -- tidy uses it... there _are_ difficult cases. but the ones wally gave are straightforward applications of dirt-simple rules...
Perhaps the right thing to do is to consider the XHTML version as the canonical version
um, no thanks... -bowerbird

On 12/21/05, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
lee said:
Perhaps the right thing to do is to consider the XHTML version as the canonical version
um, no thanks...
Indeed. It would be more sensible to mark the poem (indeed the whole work) in TEI, and use that as the canonical version. Several DPers are working quite hard in preparing the way for a gradual transition to a PG-themed TEI. Hopefully within a couple of years almost all the work DP produces will be in a format which will enable us to produce multiple document versions (HTML, text, PDF, LaTeX) with very little human intervention required. Until that's ready, however, using something like XHTML sensibly can work very well -- this means marking structure, not presentation. Don't mark up a poem like this: <pre> satisfy the best soodra society,--<br/> "With the yellow torches gleaming,<br/> And the scarlet mantles streaming,<br/> And the canopy above<br/> Swaying as they slowly move."<br/> Karlee has assured me that neither his </pre> but instead enclose each line within a start/end tag, each stanza within a start/end tag, etc., using CSS to fine tune the presentational details. -- Jon Ingram

Jon Ingram wrote:
Indeed. It would be more sensible to mark the poem (indeed the whole work) in TEI, and use that as the canonical version.
I can definitely agree with this; I think TEI, even minimal TEI, is probably a better canonical form than XHTML. But TEI-encoded text has to exist, and it has to be made available to the public so they can get the canonical form if they desire. If TEI exists, use it, if not use the best encoding available. I don't think Mr. Thompson is prepared to create a canonical TEI version of the work just to solve this one problem (although it would be cool if he did). The Transcriber's Notes should simply refer to whatever canonical form is available.

On 12/21/05, Lee Passey <lee@novomail.net> wrote:
Jon Ingram wrote:
Indeed. It would be more sensible to mark the poem (indeed the whole work) in TEI, and use that as the canonical version.
I can definitely agree with this; I think TEI, even minimal TEI, is probably a better canonical form than XHTML. But TEI-encoded text has to exist, and it has to be made available to the public so they can get the canonical form if they desire. If TEI exists, use it, if not use the best encoding available. I don't think Mr. Thompson is prepared to create a canonical TEI version of the work just to solve this one problem (although it would be cool if he did). The Transcriber's Notes should simply refer to whatever canonical form is available.
Actually, I've intended to use TEI or something more capable to create a canonical version from the start. My original questions merely had to do with the plain text version, which I'm trying to prepare without losing too much information. Wally

On 12/22/05, Wally Thompson <wally.thompson@gmail.com> wrote:
Actually, I've intended to use TEI or something more capable to create a canonical version from the start. My original questions merely had to do with the plain text version, which I'm trying to prepare without losing too much information.
Well, it's impossible to create a pure text edition of a complex document without losing information. Personally I think we spend far too much time at PG worrying about the look of the plain text edition -- for many complex documents the plain text edition is a very poor relation indeed. That said, in this specific case you'll find life easier if you don't try to change the indentations in the poetry to blocks -- we used to interpret indentations in poetry as indicating new stanzas at DP, and hence changed them to blank lines, but as you've noticed there are quite a few works that use both indentations and blank lines, so it's easiest just to replicate the indentation and gaps used in the original work. -- Jon Ingram

Bowerbird wrote:
lee said:
The problem is that historically Project Gutenberg has been considered a NMA (No Markup Allowed) zone.
just because _you_ can't "see" the markup, lee, doesn't mean that it isn't there...
one or more leading-spaces in a line is a signal that the line should not be wrapped to the line above it...
as you are someone who worked on "tidy", i'd have expected that you'd be familiar with this convention, since -- to the best of my knowledge -- tidy uses it...
there _are_ difficult cases. but the ones wally gave are straightforward applications of dirt-simple rules...
I'm under the impression that PG did not establish any rules or guidelines early on in the game as to how to format plain text to unambiguously express various document structures. ZML is intended to be such a uniform ruleset for regularizing plain etexts. In fact, the need for Bowerbird to even invent ZML indicates to me that he saw a need for ZML after he studied PG etexts and saw variations in conventions. Bowerbird is the expert on this, so I defer to him to discuss if indeed he saw variation in how PG plain texts communicated structure. No matter, I believe that eventually all the older PG texts, including a lot of the classics, will be remastered (probably by DP) into TEI or XHTML. They will definitely NOT be mastered in ZML. If ZML is used at all, it will be as a derivative format so the plain text enthusiasts have something uniform to use. Jon

Bowerbird@aol.com wrote:
as you are someone who worked on "tidy", i'd have expected that you'd be familiar with this convention, since -- to the best of my knowledge -- tidy uses it...
Absolutely not. In essence, Tidy parses an HTML document into an internal DOM tree, fixing some egregious errors as it goes. Because newline characters are just whitespace in HTML, and multiple runs of whitespace are not significant, newlines are converted to spaces, and runs of whitespace are collapsed, at parse time. A few other fixes are then made to convert the well-formed XML DOM into valid HTML, and then it gets spit back out. Whether any whitespace gets added to the beginning of lines is the user's choice. Personally, I usually turn both indenting and word-wrap off; default values are to wrap at column 68 and use 2 space indentation.
participants (5)
-
Bowerbird@aol.com
-
Jon Ingram
-
Jon Noring
-
Lee Passey
-
Wally Thompson