Tidy -c and tables

21 Apr 2010

      OK guys, we have a problem.

When one uses the "--clean" option, tidy removes any "<center>" elements 
and replaces them with "<div class='c1'>", and adds "div.c1 {text-align: 
center}" to the internal style sheet. This seems reasonable, because 
according to the HTML spec, "The CENTER element is exactly equivalent to 
specifying the DIV element with the align attribute set to 'center'." In 
a bit of a chained dependency, it turns out the "align" attribute is 
/also/ deprecated in favor of the CSS "text-align" style. So Tidy's 
behavior is completely consistent with the HTML spec, and in theory 
should cause no presentational differences before and after a page is 
Tidy'ed.

In theory, there is no difference between theory and reality; in 
reality, there is.

Consider the following snippet:

<center>
   <table>
     <tr>
       <td>
         line one<br />
         a longer line two<br />
         a very much longer line three
       </td>
     </tr>
   </table>
<center>

Using my four test browsers, Firefox 3,5, IE 8, Opera 9 and Safari 4, in 
each case the above table was center in the browser, but the text inside 
the table data element remained left justified.

When I changed the "<center>" element to "<div style='text-align: 
center'>" the text inside the table data element became centered as 
well. This is the behavior I would expect; the whole notion of 
"Cascading" in CSS indicates that style continue down the tree until 
changed. But it does illustrate the fact that there is a distinction 
between centering an /element/ (in this case the table), and centering 
the text /inside/ an element. So while, in theory, the "<center>" 
element should be equivalent to "<div style='text-align:center'>", in 
practice it seems that not only are they not equivalent in /some/ 
browsers, they are not equivalent in /any/ browser.

I believe one of our design goals was that Tidy would make no change to 
otherwise valid HTML that would cause it to render differently using 
browser defaults after Tidying. Thus, empty paragraphs, which are 
forbidden, are converted to /two/ "br />" elements, to match the default 
paragraph presentation in browsers.

Leaving aside the fact that the use of tables to control layout is 
simply morally reprehensible, the fact is that there a many, many pages 
'in the wild' that do so. And Tidy's current behavior will cause those 
pages' presentations to change after running Tidy. I think that in this 
case we have not met our design goal.

Now I can fix the code so that this doesn't happen in the future, if 
only I knew what the right fix /is/. I could simply remove "center" from 
the list of elements that get 'cleaned', and print a warning that the 
resulting contains elements that are deprecated (this warning probably 
ought to be there whenever deprecated elements remain in the output). Or 
I could focus more directly on this specific issue and whenever a 
"<table>" is a descendant of a "<center>" element I could add 
"style='text-align:left'" to the "<table>" element (assuming a 
"text-align" style is not already attached to that element) /before/ 
cleaning (both styles should then be moved to the internal style sheet). 
Or perhaps there is yet another solution that I haven't thought of? I 
don't think that simply telling the end user "your HTML doesn't follow 
the rules; we could fix it but we won't" is an option; after all, that's 
what Tidy is for right?

So, what should I do?

ps. I don't like the behavior that the "--drop-font-tags" option also 
drops "<center>" elements; page layout is not in the same classification 
as font appearance, and I can envision situations where I would want to 
drop "<font>" elements but retain "<center>" elements. But that is an 
argument for another day.

Lee Passey

James Adcock

Marcello Perathoner

James Adcock

Joshua Hutchinson

Lee Passey

James Adcock

tags

participants (4)