html2txt is a command line program designed to convert from html to txt. Syntax is: html2txt [options] htmlfile [htmlfile] [htmlfile] . . . If no html file name is provided, the program will read from stdin (usually the keyboard) until an end of file mark is encountered (^Z for DOS/Windows, ^D for *nix, and I haven't a clue for the Macintosh). Text appearing in tags is centered to a line length of 80 characters by adding spaces to the beginning of the line. If the line is longer than the centering length, it will not be centered. The -head option can be used to add additional indentation to header text. Text appearing in

tags is output as paragraphs, appropriately indented. Text within paragraphs is output as a single line, followed by a single newline or CRLF pair, depending on OS. Word wrapping is not performed. By default, output is single spaced between paragraphs, paragraphs are indented by 5 spaces, the underscore character is used to indicate italicised or emphasized text, the asterisk character is used to indicate bold or strongly emphasized text, and headers are centered to a line length of 80 characters. options include the following: -i word or -italic use as the italic indicator. The word 'none' will suppress italic presentation -b or -bold use as the bold-face indicator. The word 'none' will suppress bold presentation -t or -tab use number of spaces to indent paragraphs -l or -line set the line length to for centering -s or -spacing put lines of blank text between paragraphs -v or -verbose output warning messages about badly-formed html -f or -file Write error and warning message to the specified file rather than stderr -raw leave chars > 128 unchanged upon output -ascii use ASCII for output, Latin-1 for input -latin1 use Latin-1 for both input and output -iso2022 use ISO2022 for both input and output -utf8 use UTF-8 for both input and output -mac use the Apple MacRoman character set Because it reads from stdin, html2txt can be used to convert html which is being output from another application. For example, to use the unix stream editor, sed, to replace the
tag with ending and beginning paragraph tags, then convert the resulting file to text you could type: sed 's/
/

' bad.html | html2txt > good.txt To turn off centering, set the line length to 0: html2txt -l 0 in.html > out.txt Another example: html2txt -l 0 -h 15 in.html > out.txt This would uncenter headings, and indent them by 15 spaces. To remove almost all markup, use the following: html2txt -i none -b none -t 0 -l 0 in.html > out.txt This program was recently modified to accept, without complaint, Microsoft pseudo-html as produced by Microsoft Word 2000. html2.txt is available from http://www.dysfunctionals.org/~networker/html2txt.zip.