tags is output as paragraphs, appropriately indented.
Text within paragraphs is output as a single line, followed by a single
newline or CRLF pair, depending on OS. Word wrapping is not performed.
By default, output is single spaced between paragraphs, paragraphs are
indented by 5 spaces, the underscore character is used to indicate
italicised or emphasized text, the asterisk character is used to indicate
bold or strongly emphasized text, and headers are centered to a line
length of 80 characters.
options include the following:
-i word or
-italic
tag with ending and beginning
paragraph tags, then convert the resulting file to text you could type:
sed 's/
/
' bad.html | html2txt > good.txt To turn off centering, set the line length to 0: html2txt -l 0 in.html > out.txt Another example: html2txt -l 0 -h 15 in.html > out.txt This would uncenter headings, and indent them by 15 spaces. To remove almost all markup, use the following: html2txt -i none -b none -t 0 -l 0 in.html > out.txt This program was recently modified to accept, without complaint, Microsoft pseudo-html as produced by Microsoft Word 2000. html2.txt is available from http://www.dysfunctionals.org/~networker/html2txt.zip.