
On Wed, Sep 9, 2009 at 9:45 AM, Marcello Perathoner<marcello@perathoner.de> wrote:
Do you read a post before replying?
Of course... do you?
That's exactly what I requested you to do: To parse a plain text version of Hamlet into wrapped and non-wrapped paragraphs.
You did? The following looks pretty much like HTML to me, not plain ASCII text that wraps at 70 columns (like the original poster who started this thread requested).
See if you can come up with an algorithm that doesn't make mincemeat of the following small excerpt. The algorithm should at least:
1. Recognizes that "HAMLET, PRINCE OF DENMARK by William Shakespeare" is the title statement of the work. This should be marked up like:
<h1>Hamlet, Prince of Denmark<br/><br/> by William Shakespeare</h1>
and NOT:
<h1>Hamlet, Prince of Denmark</h1> <h2>by William Shakespeare</h2>
2. Not wrap the list of persons proper, BUT wrap <p>Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and other Attendants.</p>
3. Recognize that <p>SCENE. Elsinore</p> is a stage direction, not the start of scene 1.
4. Recognize <h2>ACT I.</h2>
5. Recognize <h3>Scene I. Elsinore. A platform before the Castle.</h3> (Even if it lacks spacing.)