
A possible argument against using tidy as you mention is that it can have side effects the user does not intend. In the example you gave below, it appears to have replaced the numberic character entities the volunteer wanted to put in. Also, in the tidy executable I have, results are not always reliable in the contents of a "pre" tag; I have seen Tidy remove blank lines from within them before. When I'm preparing an html and plain text file for PG, I almost always do so in a way which has all the line endings in the same place, which makes it much easier for anyone in the future making corrections etc... I use tidy to check html, but not to produce a final version. I've just checked, and the file in question, while not necessarily the way I would have marked it up, _is_ valid HTML 4.01 Transitional, which matches what is required to add it to PG. Andrew On Mon, 1 Nov 2004, Brad Collins wrote:
I took a look at the source for the recent handsome re-release of PG's edition of A Christmas Carol (46-h).
The code is bit old, <p> tags are not terminated and the formating could be formated a bit better to make it more readable.
For example, the first paragraph looked like this:
<p> <span class="caps">Marley</span> was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge’s name was good upon ’Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail.
I ran the file through HTML-Tidy which turned it into this:
<p><span class="caps">Marley</span> was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge's name was good upon 'Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail.</p>
It took about ten seconds to open the, file run the file through tidy and save it. This resulted in a file which is consistent, standards compliant and far easier to read and process.
Open tags in HTML are an artifact of SGML which can confuse some browsers, processing software and limit what you can do with CSS.
I suggest that all PG html files be run through Tidy before being released.