Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 26)

Marcello Perathoner <marcello@perathoner.de> wrote:
Lee Passey wrote:
On the other hand, I don't see how Mr. Hutchinson's second example could validate if the first does not, particularly given the fact that DTD's are not structured in such a way to permit a validator to make that kind of a judgment ("if a <div> contains a <div> it must be the last element of the first <div>" or "if a <div> contains a <div> it may be preceded by a <p>, but not followed by one").
This simple declaration does exactly that:
<!ELEMENT div (p*, div*)>
"A div may contain zero or more p followed by zero or more div."
Well, I carefully decomposed the TEI DTD and discovered that you're absolutely right (but you knew that already, didn't you :-)). As I understand it, a <div> can contain just about any other element, but once you include another <div> you can't include anything else (almost). What the hell were they thinking? I don't see anything in the English spec that would have led me to this conclusion, and I can't think of any rationale why it should be this way. Is it possible that the DTD has incorrectly implemented the TEI spec? Or did the authors really intend this inane result? I have to admit, this requirement (and the fact that <div> is not allowed inside <p>) really makes me have second thoughts about the usefulness of TEI as an encoding (because it hinders you from making a level-one, incomplete, encoding). I would really like to know what the rationale for this rule is.
In this particular case, I suspect a bug in the validator program. I mean, writing validators is hard, and I am aware of at least one bug in the W3C's online HTML validator.
No bug. The TEI dtd is broken as designed.
Well, at least I was able to figure out that _something_ was broken.

On 25 Aug 2005, at 15:07, Lee Passey wrote:
Marcello Perathoner <marcello@perathoner.de> wrote:
Lee Passey wrote:
On the other hand, I don't see how Mr. Hutchinson's second example could validate if the first does not, particularly given the fact that DTD's are not structured in such a way to permit a validator to make that kind of a judgment ("if a <div> contains a <div> it must be the last element of the first <div>" or "if a <div> contains a <div> it may be preceded by a <p>, but not followed by one").
This simple declaration does exactly that:
<!ELEMENT div (p*, div*)>
"A div may contain zero or more p followed by zero or more div."
I would really like to know what the rationale for this rule is.
There's been much discussion about this on the TEI-L, but not much resolution, as far as I can tell. Here's what Lou Burnard wrote in the "<p> and <divN>" thread: "There is a long tradition of embedding distinct narratives within an overarching framing narrative: as well as the Arabian nights, we could cite Bocaccio, Chaucer etc. I continue, stubbornly, to think that the right way to deal with these is as embedded texts. The one which my learned colleague Rahtz refers to is rather different: here we have a distinct paragraph-like object within a div which has the unusual property of itself containing paragraph-like objects, but which is not really a self-contained text. We could call it a paraDiv and maybe, if we can find more evidence, it should be admitted into P5." It would seem that most often, a text (like a letter) included in another text would be marked up something like <q><text>...</text></q> or <ab><text>...</text></ab>. The archives for TEI-L can be found at <http://listserv.brown.edu/archives/tei-l.html>. Just search for "div" among the thread names and you should find plenty discussion about this problem. BTW, is this the sort of thing we should be discussing at gutvol-d? Wasn't there a PG-XML list or something? -- branko collin collin@xs4all.nl

Branko Collin wrote:
BTW, is this the sort of thing we should be discussing at gutvol-d? Wasn't there a PG-XML list or something?
That list is for James Linden's markup language and is not hosted at pglaf.org. We should use gutvol-p so we can get you-know-who moderated. -- Marcello Perathoner webmaster@gutenberg.org

Lee Passey wrote:
Well, I carefully decomposed the TEI DTD and discovered that you're absolutely right (but you knew that already, didn't you :-)). As I understand it, a <div> can contain just about any other element, but once you include another <div> you can't include anything else (almost).
What the hell were they thinking?
I don't see anything in the English spec that would have led me to this conclusion, and I can't think of any rationale why it should be this way. Is it possible that the DTD has incorrectly implemented the TEI spec? Or did the authors really intend this inane result? I have to admit, this requirement (and the fact that <div> is not allowed inside <p>) really makes me have second thoughts about the usefulness of TEI as an encoding (because it hinders you from making a level-one, incomplete, encoding).
This "quirk" is intended and many a rebarbative mail has been written on the TEI-L mailing list defending or belittling this choice. You may subscribe and start another thread. Maybe they'll get tired of explaining this thing to people and change it in the new TEI revision. (I've done this 2 years ago and it did me no good.) -- Marcello Perathoner webmaster@gutenberg.org

Lee Passey wrote:
Marcello Perathoner wrote:
Lee Passey wrote:
On the other hand, I don't see how Mr. Hutchinson's second example could validate if the first does not, particularly given the fact that DTD's are not structured in such a way to permit a validator to make that kind of a judgment ("if a <div> contains a <div> it must be the last element of the first <div>" or "if a <div> contains a <div> it may be preceded by a <p>, but not followed by one").
This simple declaration does exactly that:
<!ELEMENT div (p*, div*)>
"A div may contain zero or more p followed by zero or more div."
Well, I carefully decomposed the TEI DTD and discovered that you're absolutely right (but you knew that already, didn't you :-)). As I understand it, a <div> can contain just about any other element, but once you include another <div> you can't include anything else (almost).
What the hell were they thinking?
Hmmm, when I parsed the full content model for TEI <div> (which I obtained from TEI Pizza Chef -- again a cool way to flatten the TEI P4X DTD), removing all elements except for <div> and <p> (let's assume that's all we're going to use in our little thought experiment here), I get for the content model: <!ELEMENT div (div+ | (p+, div*))> Compared to the one mentioned by Marcello, it is close to the same one, but not exactly the same -- i.e., we cannot have <div></div> -- <div> has to contain something. But other than this small difference it is otherwise logically the same, so Marcello's is easier to wrap our minds around.) Not sure, Lee, what you mean by your statement. What is interesting, as noted by Lee in a prior message (see above), that one may not have a <p> after a <div> within another <div>. I.e., this appears to be invalid TEI P4X: <div> <p>something</p> <div>...</div> <p>something else</p> </div> While this is valid: <div> <p>something</p> <p>something else</p> <div>...</div> </div> Last year I talked about Burton's Arabian Nights which structurally and logically follows, in many places, the p-div-p pattern, to several levels of nesting. So if I were to customize the TEI content model for <div>, I would seriously allow a <p> after a child <div>. I don't see a problem in doing this, but then I don't fully understand the quite subtle explanations offered on TEI-L regarding this (especially as it pertains to the Arabian Nights where stories are wrapped inside of stories to several levels of nesting.)
I don't see anything in the English spec that would have led me to this conclusion, and I can't think of any rationale why it should be this way. Is it possible that the DTD has incorrectly implemented the TEI spec? Or did the authors really intend this inane result? I have to admit, this requirement (and the fact that <div> is not allowed inside <p>) really makes me have second thoughts about the usefulness of TEI as an encoding (because it hinders you from making a level-one, incomplete, encoding).
I would really like to know what the rationale for this rule is.
I suggest that you repackage your inquiry and post it to TEI-L. That's where the bulk of the TEI experts, including those involved with TEI development, hang out. Sebastian Rahtz is one of the brilliant people there who is interested in a more constrained TEI subset for ebook use. http://www.lsoft.com/scripts/wl.exe?SL1=TEI-L&H=LISTSERV.BROWN.EDU Maybe this time around the TEI mavens will be able to cogently explain (at the highest abstract level) why one should not have a <p> after a <div> within another <div>, but can have a <p> before the <div>. I'm still quite perplexed, as it appears Marcello is as well. Jon
participants (4)
-
Branko Collin
-
Jon Noring
-
Lee Passey
-
Marcello Perathoner