More than you ever wanted to know about XHTML and CSS

gutvol-d-request@lists.pglaf.org wrote:
On Sun, Apr 24, 2005 at 04:20:15PM -0400, Bowerbird@aol.com wrote:
i'm trying to look at the .html version of #15701, which i downloaded as a zip file to my own machine, and it seems to require an open internet connection. it wants to call the w3 or something. why?
I don't know. It doesn't for me. It's conceivable--just-- that your browser is trying to pre-fetch the W3 DTD as defined in the DOCTYPE declaration, but it's the first time I ever heard of something like that happening. And the same declaration is in lots of texts; nothing new or strange about this one.
Internet Explorer 5.5 is the only browser I have for which my firewall is configured to prevent outgoing access without permission. Opening this file in IE 5.5 does not create any outgoing connections. Examination of the file reveals that there are no resources referenced in the file external to the file itself except: 1. the DTD declaration, and 2. an image of the Burke coat of arms. Given the fact that your browser is attempting to contact the W3C (the "owners" of the XHTML DTD) I would agree with Mr. Tinsley that your browser seems to be attempting to fetch the declared DTD. In fact, given that Opera seems to have fairly good support for most XML vocabularies other than XHTML, I would bet that you're seeing this behavior when using the Opera browser. When you refuse the outgoing connection, is the document displayed anyway? (not that it will make any difference, but I _am_ curious). [snip]
and the .html version of #15698 won't work in either one...
That one is more interesting; it doesn't have a terminating HTML comment mark after the <style>. However, the W3C validators have no problem with it, and the parse tree is recognized, and I've given up trying to track all the ways that foreign command-sets or languages can be embedded in HTML. Maybe a newer browser will help. What is Opera on now? 8?
The problem is, indeed, the unterminated comment. The XHTML DTD defines the <style> element as containing #PCDATA, which is to say textual 'stuff' which may or may not be HTML. An HTML User Agent should _not_ attempt to parse any of the data between <style> and </style>, but should pass that text on to the stylesheet parser. It has become common to embed an internal style declaration inside HTML comments (<!-- -->) for compatibility with older browsers which did not support style sheets. If a browser did not support style sheets it would encounter the <style> tag and ignore it, as all good browsers are designed to do. It would then encounter the HTML comment tag and ignore everything until the closing tag was encountered. That way the browser wouldn't display the style definitions as just more text. On the other hand, stylesheet parsers are designed to ignore the comment tags themselves, so all the stylesheet goodness is visible to a stylesheet parser. While the lack of a closing comment tag in the <style> element is a bug in the document, the failure of your browser(s) to ignore comment tags in a <style> element is also a bug in those programs. While I don't have a working installation of IE prior to 5.5, which does _not_ have this problem, the problem also presents itself in Opera 7.11, but has been fixed in Opera 7.51. My experience has been that in the past Opera has been somewhat slavishly devoted to mimicing the behavior of IE, even when that behavior is contrary to internet standards (Javascript implementations come to mind). It is therefore not surprising that early versions of Opera should have the same behavior as early versions of IE. Despite the bugginess of your browsers, the HTML text at Project Gutenberg really should be fixed, as this will cause the failure to display the text in any browser which does not support the <style> element. Because the contents of a <style> element is #PCDATA, HTML validators will generally not be able to catch this type of error. I have examined the source code for HTML Tidy, and when it encounters a <style> tag it simple creates a text node for the entire text up to the </style> tag. No validation of the actual style sheets is performed. I suspect that the W3C validator operates the same way. Validators are good tools, but satisfying a validator does not mean that the HTML is, in fact, valid -- only that there are no errors of the type that the validators are designed to catch. On a related note, let me say that I view internal style declarations as just plain rude. Style sheets are indeed A Good Thing, but someone imposing their quirky notions of style on me is not. By placing style definitions in an external style sheet and simply linking that style sheet into the main document with a <link> element, it makes it easy for me to strip away the suggested styles, and return to browser defaults, by simply deleting or renaming the style sheet. And if the suggested styles are mostly good, and need only a slight tweaking, it is safer and easier to edit an external style sheet than the main document. I would strongly encourage all PG volunteers who are creating HTML documents to consider putting suggested style definitions in an external style sheet rather than embedding those styles in the main document.

Lee Passey wrote:
The problem is, indeed, the unterminated comment. The XHTML DTD defines the <style> element as containing #PCDATA, which is to say textual 'stuff' which may or may not be HTML.
The problem cannot be the "unterminated comment" because the "comment" is no comment at all. Let's recap: File http://www.gutenberg.org/files/15698/15698-h/15698-h.htm has a doctype of <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> So lets take a look at the *HTML 4.01* specs (not the *XHTML* specs): <!ENTITY % StyleSheet "CDATA" -- style sheet data --> <!ELEMENT STYLE - - %StyleSheet -- style info --> The STYLE element contains CDATA, which is not parsed. In CDATA < has no special meaning at all and cannot therefore start a comment.
An HTML User Agent should _not_ attempt to parse any of the data between <style> and </style>, but should pass that text on to the stylesheet parser.
A user agent that *knows* about style sheets will not. A user agent developed before CSS will just ignore the style tags but will process the data in between.
While the lack of a closing comment tag in the <style> element is a bug in the document, the failure of your browser(s) to ignore comment tags in a <style> element is also a bug in those programs.
No bug. The program is simply to old and decrepit to know anything about style sheets. It skips the opening style tag (as it should as of the HTML standard before style sheets) and continues parsing, because it doesn't know about the contents of style being CDATA. It then finds an opening comment tag and that's all its gonna see for a long long time because the closing -- is found way down in the license.
Because the contents of a <style> element is #PCDATA, HTML validators will generally not be able to catch this type of error.
It is not! (People should really read the specs!) If it were #PCDATA (*Parsed* Character DATA) the validator would find the error because it would parse it. Its because its CDATA that the validator doesn't parse it and in consequence cannot find the "error". In CDATA < has no special meaning at all. -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Lee Passey
-
Marcello Perathoner