roger said:
> I actually use a tool I've written (ppsmq) to
> turn straight quotes in a text into curly quotes.
it seems like you don't remember that i was the one who
scoffed when you insisted this couldn't be done accurately.
so i explained it very carefully.
it was either here on this listserve, or in the d.p. forums,
or probably both places, if you wanted me to locate that.
anyway, i'm glad you finally saw the light.
don said:
> Double-quotes are an interesting case.
> The editor WordPress interestingly automatically
> converts them to curly-quotes without even asking,
> and with great accuracy.
gee, don, it seems like you don't remember this either.
and yes, i had to explain all of this to you as well, don...
and i did. so the "great accuracy" shouldn't surprise you.
it is a very straightforward -- and simple -- task. really.
> It even does a great job on single-quotes.
that's a little bit more difficult. but only a very little bit.
it's generally the same process.
> I need to look at the algorithms when I get some time.
so yes, you did forget. so yes, i will explain it once again...
quotemarks on the left side of a line are opening quotemarks.
quotemarks on the right side of a line are closing quotemarks.
more generally, a quotemark which has whitespace to its _left_
is an opening quotemark. a closing quotemark has whitespace
to its _right_. brackets, braces, and parens should be ignored.
(note that any markup is another thing that should be ignored.)
within a paragraph, the flow goes open/close/open/close, etc.
so any "spacey" quotes within such a sequence are predictable.
paragraphs span pagebreaks, so do not let that confuse you.
mismatch means there's an error _somewhere_... could be
a dropped or improperly-inserted quote, bad paragraphing,
a doublequote o.c.r. misrecognized as singlequote (or a pair),
a doublequote misrecognized as a number (e.g., 11, 44, 77),
a mistake in the p-book, or some other freakish occurrence,
but you can be sure there's some kind of error somewhere,
and that you're gonna have to bring in a human to resolve it.
a paragraph's final close-quote that's missing _can_be_ ok,
but only if the following paragraph begins with a quotemark.
in the case of singlequotes, the apostrophes in contractions
must be dropped from the analysis, which is usually simple,
as those apostrophes do not have whitespace on either side.
however, the apostrophes which indicate _possessives_ are
_sometimes_ at the end of a word, so that's a complication.
you will also find that sometimes there is an apostrophe
appearing at the beginning of a word which nonetheless
should display with a closing-curl, a la "the roaring '20s".
slang typography can also produce a myriad of wrinkles...
keep in mind that all of these finer points i just discussed
pale in significance to the fact that the big stuff is simple.
most of the time, what you do will end up being error-free.
and if there's a glitch, it will draw your attention naturally.
the structures here follow fairly strict rules, so any collapse
causes something else to fail, and that causes more failure,
and so on, to the point where the flaw will become apparent.
and your beta-readers, absorbing the content, are a last line.
make sure they know that they're looking for this "petty" stuff,
because you repaired all the "obvious" stuff a long time back...
> one I'll solicit community improvement suggestions for.
> It's the regex I use to find problematic quotes.
...
> /([^\s\(-]?)"(\s*)([^\\]*?(\\.[^\\]*)*)(\s*)("|^\r)([^\s\)\.\,\?;:-]?)/gim
reg-ex is the wrong way to go on this... just write the code...
i've told you that before. and here i am, saying it once again.
no wonder i'm getting bored repeating myself over and over.
you should have learned this lesson 5 years ago. 5 years ago!
if you do it the wrong way, of course it's gonna seem difficult.
just do it the right way, and you'll discover that it's very easy...
here's the python code to split the book into paragraphs:
> theparagraphs=thebook.split("\n\n")
then walk through the array of paragraphs, examining each.
you just specify each quotemark as open, close, or spacey...
if something doesn't fit -- e.g., it's supposed to be a "close",
but it appears at a line-start, or has whitespace to its left --
you know that you have a glitch, which will need to be fixed.
otherwise, you're good to go. do it, and stop wasting time...
-bowerbird