
OK guys, we have a problem. When one uses the "--clean" option, tidy removes any "<center>" elements and replaces them with "<div class='c1'>", and adds "div.c1 {text-align: center}" to the internal style sheet. This seems reasonable, because according to the HTML spec, "The CENTER element is exactly equivalent to specifying the DIV element with the align attribute set to 'center'." In a bit of a chained dependency, it turns out the "align" attribute is /also/ deprecated in favor of the CSS "text-align" style. So Tidy's behavior is completely consistent with the HTML spec, and in theory should cause no presentational differences before and after a page is Tidy'ed. In theory, there is no difference between theory and reality; in reality, there is. Consider the following snippet: <center> <table> <tr> <td> line one<br /> a longer line two<br /> a very much longer line three </td> </tr> </table> <center> Using my four test browsers, Firefox 3,5, IE 8, Opera 9 and Safari 4, in each case the above table was center in the browser, but the text inside the table data element remained left justified. When I changed the "<center>" element to "<div style='text-align: center'>" the text inside the table data element became centered as well. This is the behavior I would expect; the whole notion of "Cascading" in CSS indicates that style continue down the tree until changed. But it does illustrate the fact that there is a distinction between centering an /element/ (in this case the table), and centering the text /inside/ an element. So while, in theory, the "<center>" element should be equivalent to "<div style='text-align:center'>", in practice it seems that not only are they not equivalent in /some/ browsers, they are not equivalent in /any/ browser. I believe one of our design goals was that Tidy would make no change to otherwise valid HTML that would cause it to render differently using browser defaults after Tidying. Thus, empty paragraphs, which are forbidden, are converted to /two/ "br />" elements, to match the default paragraph presentation in browsers. Leaving aside the fact that the use of tables to control layout is simply morally reprehensible, the fact is that there a many, many pages 'in the wild' that do so. And Tidy's current behavior will cause those pages' presentations to change after running Tidy. I think that in this case we have not met our design goal. Now I can fix the code so that this doesn't happen in the future, if only I knew what the right fix /is/. I could simply remove "center" from the list of elements that get 'cleaned', and print a warning that the resulting contains elements that are deprecated (this warning probably ought to be there whenever deprecated elements remain in the output). Or I could focus more directly on this specific issue and whenever a "<table>" is a descendant of a "<center>" element I could add "style='text-align:left'" to the "<table>" element (assuming a "text-align" style is not already attached to that element) /before/ cleaning (both styles should then be moved to the internal style sheet). Or perhaps there is yet another solution that I haven't thought of? I don't think that simply telling the end user "your HTML doesn't follow the rules; we could fix it but we won't" is an option; after all, that's what Tidy is for right? So, what should I do? ps. I don't like the behavior that the "--drop-font-tags" option also drops "<center>" elements; page layout is not in the same classification as font appearance, and I can envision situations where I would want to drop "<font>" elements but retain "<center>" elements. But that is an argument for another day.