
On Mon, Nov 01, 2004 at 12:43:41PM +0700, Brad Collins wrote:
I took a look at the source for the recent handsome re-release of PG's edition of A Christmas Carol (46-h).
The code is bit old, <p> tags are not terminated and the formating could be formated a bit better to make it more readable.
... Strangely, this title doesn't have the usual filename mask in GUTINDEX.ALL. I'm cc'ing George to see about adding this. The answer, as you saw, is that the file is old and therefore predates our current procedures. Lacking a /p doesn't prevent a file from passing the validator at w3c, except for the most recent HTML versions, so this file could probably still pass today. Anyway: cleaning up HTML is definitely welcome. When we update a file, these days, we also move it into the new directory structure (the post-10K naming scheme), so this would be /4/46/46h.htm rather than /etext91/xmas10h.htm or whatever. We also add a new header, and apply it to all other files for this eBook. In short, it's more involved than just fixing the file. David Widger has updated hundreds of titles, and we would welcome anyone else with desires to work on this task. Personally, I would not mind waiting until we also have good XML procedures in place, so that we could kill two birds with one stone (actually, more than one stone, since it's more work). Finally, let me mention that we usually also run gutcheck and find/fix many other errors in a typical older title. I hope this helps explain. I didn't mention any limitations of Tidy, but of course like any tool you need to make sure it doesn't accidentally do greater harm than it solves. Really finally: send updated files (or URLs) to errata AT pglaf.org , even if you didn't do all of the above. Thanks! -- Greg
For example, the first paragraph looked like this:
<p> <span class="caps">Marley</span> was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge’s name was good upon ’Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail.
I ran the file through HTML-Tidy which turned it into this:
<p><span class="caps">Marley</span> was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge's name was good upon 'Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail.</p>
It took about ten seconds to open the, file run the file through tidy and save it. This resulted in a file which is consistent, standards compliant and far easier to read and process.
Open tags in HTML are an artifact of SGML which can confuse some browsers, processing software and limit what you can do with CSS.
I suggest that all PG html files be run through Tidy before being released.
If anyone wants the tidy'd version let me know.
b/
-- Brad Collins <brad@chenla.org>, Bangkok, Thailand
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d