
john, thanks for joining the betty lee fun parade! :+) can we retire the "mount high horse" header, though? it was an honorific for 2011, intended to end an era... i've looked at your work, and have a few notes for you. first of all, those hyphens are not "spurious", they're the hyphens which actually exist in the printed-book. but yes, you want to get rid of them, and here's how... (first i'll tell you that i will make web-apps available to perform these routines for people automatically, so it's not like you really _need_ this documentation, but i happily tell you this information if you want it.) here's how you might wanna rework a .zml text-file... 1. a hyphen at the very end of the line is a "soft" one, meaning it can -- should -- be eliminated on unwrap. you can do this programmatically. in python, it'd be:
thebook=re.sub("-\n","",thebook)
but note that you _must_ attend to em-dashes _first_:
thebook=re.sub("--\n","—",thebook)
2. if there's a hyphen-tilde combo at the end of a line, this indicates a _hard_ hyphen that should be retained. the tilde is eliminated. (think of it as "sacrificing itself", with a heroic gesture, so that the hyphen can be saved.)
thebook=re.sub("-~\n","-",thebook)
3. you'll also find cases of a tilde-tilde pair at line-end, or -- to be more accurate -- at the end of a _paragraph_. this tilde-tilde pair indicates a doublequote mark which was _dropped_ because the speaker's dialog _continues_ in the next paragraph. (this will enable you to perform a check on the balancing of doublequotes, but otherwise, the only thing to do with the tilde-tilde pair is delete it.) 4. you'll need to do the other lines to finish an unwrap. all lines should be unwrapped _except_ blank lines and any lines which start with a space as their first character. i do this as a multi-step procedure in my wordprocessor, but i'm sure some reg-ex person can "show us the way". any linebreak with whitespace on either side is retained; all other linebreaks are deleted. shouldn't be that hard. (but if no one spills, i will give you my python routines.) 5. "{{" lines give the filename of the _scan_ for that page. "[[" lines are the pagenumber of the page, printed or not. the "[[" lines can be preceded by up to 3 blank lines, and _all_ of those blank lines must be deleted on an unwrap, as should the "[[" lines themselves (although you _might_ wanna save some kind of reference to the pagenumber). "{{" lines should be deleted as well, of course, and they are always followed by at least _one_ blank line, which must also _always_ be deleted. sometimes there'll be _more_ than one blank line following a "{{" line, _but_ additional lines after the first one _must_ be retained. you'll understand these rules _implicitly_ once you've been told the _reasons_ for them, but that's for later. 6. looks like you figured out everything else you need. i especially like how you handled the letter from rose... the p-book didn't really set it off much, so neither did i, but i generally think that such material should be set off. as criticism, i'd say that your table of contents is skimpy.
The conversions are with a python script, as you will have guessed. If you want it, you can have it---Now!
you guys should take up john on his open-source offer.
The HTML looks generally acceptable to me, but the images are missing.
the scans are in the same directory as the .zml file.
i leave my book subdirectories (under "go") wide open, and name my files intelligently, so it should all be clear.
BB, could you be persuaded to add the HTML to your treasury, so that we can see the result in context?
do you want me to add _my_ .html file to my site? if so, then that will be coming very soon, john, yes. or do you want me to add _your_ .html to my site? i'm willing to do that, if you'd consider it an honor, or something, but i'll assure you that it's really not. your .html version was serviceable, but i believe you could probably do a better job on a conversion now, by using the information which i've given you above. a book this simple isn't really much of a test, however. try working on the test-suite which you can find here:
-bowerbird