
9 Sep
2009
9 Sep
'09
1:03 p.m.
On Wed, Sep 9, 2009 at 8:12 AM, Marcello Perathoner<marcello@perathoner.de> wrote:
ROTFL! Apply that algorithm to Hamlet and see.
See if you can come up with an algorithm that doesn't make mincemeat of the following small excerpt. The algorithm should at least:
As you already know, parsing HTML is a much easier matter than parsing semi-freeflow text (which was the original poster's request). Also remember, I do this all the time for spiders we write for Plucker. I slice, I dice, and I make beautiful, automated works of art from the worst, most semantically-incorrect HTML out there. See some examples here: http://projects.plkr.org/