Re: [gutvol-d] Thanks and a request for code

On Mon, March 5, 2012 7:56 am, Robert Gibbins wrote:
Might I ask you to send me the code please?
It's hidden away on another machine; I'll try to dig it out in the next day or two...but you may not want it, as explained below.
PS The rough idea I had was to write a Java version as you suggest in your append, though I had thought of using JDOM rather than DOM. I haven't spent any time thinking about JDOM vs DOM yet, but if you have I'd be interested in your view.
IIRC, JDOM is an open-source DOM implementation. From what I've heard, it is very well regarded -- but it is not standard. I have chosen to use org.w3c.dom because 1. it is the standard and 2. it is included in the standard Sun/Oracle distribution, so I can be assured that it will be available without having to reference an external jar. I'm not sure that my code will be of much value at this point; it relies on Tidy to build a non-standard DOM. Tidy is getting a bit long in the tooth, and there are better ways to build a DOM these days (for C/C++ I tend to use Expat + DOMCAPI). The processing logic is extremely simple. You do an inorder traversal of the DOM tree, and for each node you do pre-child processing, then process each child, then do post-child processing. For example, suppose you hit a <p> node, and your rule is that a new paragraph starts on a new line. Before processing the <p>'s you want to emit a new line character (two if you're not already at the start of a line). Then you process all the children of the <p>, applying whatever rules are appropriate. Lastly, before processing any nodes following the <p> you emit another newline. Now suppose you want to indicate /italics/ with _underscores_. When you encounter an <i> element you emit a '_' character, process the children nodes, then emit another '_'. A text node has no children, so you just emit the contents of the text node. If I understood the rules of ReStructured Text, or s.m.l., I'm sure I could use this general algorithm to generate either of them from HTML. You could probably follow this same general approach using stream-oriented processing, I just found it easier and quicker to use a DOM.
participants (1)
-
Lee Passey