Re: [gutvol-d] Producing epub ready HTML

24 Jan 2012

      [Same message as sent earlier, but without attachments just in case]

On 1/24/2012 1:46 AM, Marcello Perathoner wrote:
...
On 01/24/2012 12:25 AM, Lee Passey wrote:
...
...
On Mon, January 23, 2012 3:48 pm, Marcello Perathoner wrote:
...
Please show me how.
You do it the same way you do TEI, except you map the tags
differently
...
Ok. Now put that into code, runnable on an ubuntu box,
I see. You don't just want me to show you how, you actually want me to 
do all the work. I'm not sure I want to go to that much effort simply to 
demonstrate to you that it's possible.

What I will do for you is send you the C++ code that you can compile and 
install yourself. Attached are two zip files; one contains an early 
version of a C++ version of Tidy (circa 2002) the other additional files 
used to create the html2txt executable. html2txt.zip contains a file 
named "filelist.txt" which lists the files from each archive necessary 
to build the program. I neglected at that time to add the "readme.txt" 
to the zip file, so I am attaching it here separately.

I don't know if the gutvol-d list software strips off attachments or not 
(if it doesn't, it should). If anyone else would like this code and it 
doesn't come through, contact me directly.

The theory of operation of converting HTML to text is really quite 
simple, and there's plenty of ways to skin this particular cat. If I 
were doing this again I would probably use Java as it has all the DOM 
parsing and manipulating functions necessary, if not built in then 
readily available. With Java it could easily be done in a couple of 
hundred lines of code and would "run everywhere."

The method is so simple and straight-forward that probably even 
BowerBird could do it in Python, and I'm sure it's doable as an XSL 
script as well.
...
and give it to the WWers to evaluate.
LOL! I'm not convinced that any of the white-washers could even spell 
ubuntu, let alone compile, install and use a Linux program. For them 
I've attached a MSWindows executable built from the attached code.
...
Take a hundred random samples from the archive and pipe the HTML file
thru your device and see if something very close to the posted txt file
comes out. (You may safely ignore where the lines break, but not the
number of empty lines between blocks.)
This is an exercise left to the reader. Of course, the real test is not 
equivalency, but whether the output is something the white-washers would 
accept; no one can judge whether this requirement is met except the 
white-washers themselves.