
Marcello>- export html from msword Marcello>- run html thru tidy with the msword clean option I'm not having luck getting tidy to work wonders with the msword clean option using word 2010. It is because the" tidy --msword yes" option is designed for word 2000? I do have pretty good luck with the "filtered" html option from msword 2010 without using tidy: For example, I take a "clean" html book file I am working on: 720kB utf-8 html I import it into word 2010 and save it in docx format: 380kB docx I open that (to make sure word isn't "cheating") and then save it as an unfiltered html: 1181kB html I "tidy -uft8 --msword yes" on that and I get: 1156kB of hot mess (which still renders fine in an html browser) Compared to saving filtered html: 1696kb unicode html -- which sounds horrible, until one realizes that msword has expanded my utf-8 into Unicode, Which is easily fixed say in notepad++ 853kB uft-8 html And after a half-hour of manual (regex) clean up I am back full circle [almost] to where I started from with "clean" HTML: 716kB uft-8 html Now I think it is fair to argue that this is not a "fair test" in that I started with clean hand-written HTML to begin with before turning that into a word doc[x]. If you take something written in dreamweaver and export it to doc and from there to filtered html it seems to me that it is more-than-likely that some parts of the dreamweaver don't correspond happily to the html in which case a greater effort will be required on those parts. And one still has to consider that kindlegen can still mess up even well-written html.