
On 1/1/2012 1:26 PM, Karen Lofstrom wrote:
On Sun, Jan 1, 2012 at 7:28 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
- export html from msword - run html thru tidy with the msword clean option - convert to epub and kindle with calibre.
Some hand cleaning of the html may also be necessary. YMMV
I thought that msword generated sloppy html?
MSWord doesn't produce /sloppy/ HTML, it produces /excessive/ HTML. MSWord is designed to write stuff which will then be printed. When you save HTML from MSWord it assumes you're going to want to load it back into MSWord at some point to be printed. So it saves out /everything/ it needs to convert from MSHTML back to DOCX. This is why you run tidy immediately after exporting from Word. Don't use the "Compact", or "Filtered," or whatever they call it these days, HTML format because that removes the hints that Tidy uses to know that special Microsoft processing is required. Once you get this far, never look back--this process is a one-way street.
What I've been reading online is that you get better results using something like Dreamweaver to do the reformatting.
Forget about Dreamweaver. It has the same fundamental flawed design philosophy that MSWord has: that you require that the output look exactly the same on every machine in the world. Dreamweaver has only two things working in its favor: it tends to be ubiquitous, and it's better than MSWord. This is what the bard was referring to when he coined the term "damning with faint praise." If you have even passing familiarity with HTML you can do any necessary cleanup by hand using a simple text editor. (On Windows, TextPad and the free Microsoft Visual Web Developer Express are among my favorites; for a more cross-platform solution the Eclipse HTML editor is not bad).
Also, will msword handle footnotes and indices?
It should, although I couldn't say for sure without seeing the document. And don't omit any indices; the notion that full text search is a replace for indices or a table of contents is a myth promulgated by those who are too lazy to create them.