
as the merry-go-round spins faster and faster, it takes less time to return to a specific point... now we're talking (again) about doublequotes? it was less than one month ago, december 28th, when i reminded don that i had explained them to him many years back, and i retold the story... check the archives, it was pontification #1495... oh heck, here's the link, to make it easy for you:
http://lists.pglaf.org/mailman/private/gutvol-d/2011-December/008690.html -bowerbird p.s. no matter whether you curl quotes or not, you're gonna make _some_ end-users unhappy if they can't reverse your decision to their liking. so, which of the file-formats you like allow that?

On Wed, January 25, 2012 8:00 pm, Bowerbird@aol.com wrote:
no matter whether you curl quotes or not, you're gonna make _some_ end-users unhappy if they can't reverse your decision to their liking. so, which of the file-formats you like allow that?
Theoretically, any of the XML-based formats, if you use named entities. For review (and because I am a pendant), the pertinent named entities are: ‘ - left single quotation mark ’ - right single quotation mark “ - left double quotation mark ” - right double quotation mark ' - apostrophe (valid XML character entity, but /not/ valid HTML) If you used one the first four named entities in place of a "curly" quote mark, whether the three-byte UTF-8 encode, or the 6-byte numeric entity, the display should always be the same. BUT... If you have a user agent that uses an external DTD, or uses an internal DTD that you can hack, you should be able to replace the values for these named entities to be straight quotes. (Remember, I'm talking theory, not necessarily practice.) In this case, you are not simply using a font that makes the glyphs /look/ like straight quotes, you would actually be /using/ straight quotes. You could even swap the values for single and double quotes to make British publications look like American ones, or vice-versa. It should be noted, as Mr. Adock did, that "The apostrophe is different from the closing single quotation mark (usually rendered identically but serving a different purpose)." For the greatest flexibility, apostrophes should be encoded as ' and not ’. Examples might be: "Texas hold 'em" or "that can't be done." Double/single swapping will look really weird if you hadn't encoded the apostrophe correctly.

Lee Passey wrote:
For review (and because I am a pendant), the pertinent named entities are:
‘ - left single quotation mark ’ - right single quotation mark “ - left double quotation mark ” - right double quotation mark ' - apostrophe (valid XML character entity, but /not/ valid HTML)
It should be noted, as Mr. Adock did, that "The apostrophe is different from the closing single quotation mark (usually rendered identically but serving a different purpose)." For the greatest flexibility, apostrophes should be encoded as ' and not ’. Examples might be: "Texas hold 'em" or "that can't be done." Double/single swapping will look really weird if you hadn't encoded the apostrophe correctly.
I'm not sure I see the value in marking up apostrophe as ' if you're a fan of curly quotes. You'd still have to go through and replace them with right single quotation marks somewhere in your workflow, as ' is the vertical single quote at U+0027.

On Thu, January 26, 2012 11:17 am, Paul Flo Williams wrote:
I'm not sure I see the value in marking up apostrophe as ' if you're a fan of curly quotes. You'd still have to go through and replace them with right single quotation marks somewhere in your workflow, as ' is the vertical single quote at U+0027.
' is not necessarily ASCII 27, that's simply the default XML mapping. If you're going to hack the DTD like I was talking about, ' can be anything you want it to be. It is true that if you stick with the default mappings ' can be a little jarring because all the quote marks will be curled but the apostrophes will be straight. However, I do see some value in maintaining a distinction between the right single quote and the apostrophe. This is, indeed, a thorny issue if you want to create a system where the user gets to choose between curly and straight punctuation. One solution is that which I proposed, which is to use ' and rely on entity definitions to "make it look good." Another might be to use ’ for quotation marks and ’ (U+2019) for apostrophes (in this case, swapping quotation styles would not affect apostrophes, but you couldn't "uncurl" the apostrophes if you wanted). Of course, the notion of swapping quotation styles really is a far edge case, so the best choice may be to not use ' at all and just tell people "if you swap quotation styles you will probably end up with a mess, so be prepared." I would still like to be able to maintain the apostrophe/quote distinction, but I'm old enough to know that I don't always get what I want. The main point of my response to the Bower Bird was that it could be done, not that it should.

Lee Passey wrote:
On Thu, January 26, 2012 11:17 am, Paul Flo Williams wrote:
I'm not sure I see the value in marking up apostrophe as ' if you're a fan of curly quotes. You'd still have to go through and replace them with right single quotation marks somewhere in your workflow, as ' is the vertical single quote at U+0027.
' is not necessarily ASCII 27, that's simply the default XML mapping. If you're going to hack the DTD like I was talking about, ' can be anything you want it to be.
Not if you want a valid XML document; the replacement text for apos must be ASCII 0x27 if it is declared at all.

On 01/26/2012 07:48 PM, Lee Passey wrote:
This is, indeed, a thorny issue if you want to create a system where the user gets to choose between curly and straight punctuation. One solution is that which I proposed, which is to use' and rely on entity definitions to "make it look good."
A better solution, which is already implemented in epubmaker, is that you use apostrophe in the master file and the converter substitutes it with right curly quote. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, January 26, 2012 12:35 pm, Marcello Perathoner wrote:
On 01/26/2012 07:48 PM, Lee Passey wrote:
This is, indeed, a thorny issue if you want to create a system where the user gets to choose between curly and straight punctuation. One solution is that which I proposed, which is to use' and rely on entity definitions to "make it look good."
A better solution, which is already implemented in epubmaker, is that you use apostrophe in the master file and the converter substitutes it with right curly quote.
Sure, it's always possible to solve the problem by re-writing the file; you could do it with regex and xsl among other tools. It's so simple that even BowerBird could do it in Python (although for this particular task maybe Perl would be better). The challenge is to produce a file that can be displayed differently according to the end user's preferences /without/ re-writing the file...

On 01/26/2012 08:49 PM, Lee Passey wrote:
The challenge is to produce a file that can be displayed differently according to the end user's preferences /without/ re-writing the file...
That is the *last* of my concerns. The application should allow to change fonts, colors, borders, etc. I meant that you can do a lot of automatic proofing if you keep quotes and apostrophe as separate characters in the master file. The apostrophe then gets substituted into the typographically correct right curly quote in the output files. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, January 26, 2012 1:27 pm, Marcello Perathoner wrote:
On 01/26/2012 08:49 PM, Lee Passey wrote:
The challenge is to produce a file that can be displayed differently according to the end user's preferences /without/ re-writing the file...
[snip]
I meant that you can do a lot of automatic proofing if you keep quotes and apostrophe as separate characters in the master file.
This is indisputably true. I would like to find a way to do this without having to re-write the master file before using it. I understand that /you/ are not interested in solving that particular problem, but everyone knows that the world revolves around /me/ and not you.

A better solution, which is already implemented in epubmaker, is that you use apostrophe in the master file and the converter substitutes it with right curly quote.
Sorry, how is it better to take an ambiguous encoding and make a possibly false assumption of how it should be rendered? (As opposed to just staying with an ambiguous rendering)

On 01/26/2012 11:45 PM, Jim Adcock wrote:
A better solution, which is already implemented in epubmaker, is that you use apostrophe in the master file and the converter substitutes it with right curly quote.
Sorry, how is it better to take an ambiguous encoding and make a possibly false assumption of how it should be rendered?
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe. -- Marcello Perathoner webmaster@gutenberg.org

It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors. "Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text. Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark. If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

How does it handle the other left one when they come in pairs? On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On Fri, January 27, 2012 12:57 pm, don kretz wrote:
How does it handle the other left one when they come in pairs?
They should never come in pairs if one has managed to convince PG submitters to follow the rule of encoding quotes as quotes and apostrophes as apostrophes. (Note, an apostrophe is an apostrophe, not the number 27. 27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept). Or is there some situation in which apostrophes are paired? And if so, wouldn't they both be encoded as apostrophes?
On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

Lee>27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept. Let's try this again. U+0027 shows up in some PG files, like say, almost all of the PG files. What, if anything, should PG do about the fact that U+0027 shows up in these PG files?

On Fri, January 27, 2012 12:57 pm, don kretz wrote:
How does it handle the other left one when they come in pairs?
They should never come in pairs if one has managed to convince PG submitters to follow the rule of encoding quotes as quotes and apostrophes as apostrophes. (Note, an apostrophe is an apostrophe, not the number 27. 27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept). Or is there some situation in which apostrophes are paired? And if so, wouldn't they both be encoded as apostrophes?
On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

On Fri, January 27, 2012 12:57 pm, don kretz wrote:
How does it handle the other left one when they come in pairs?
They should never come in pairs if one has managed to convince PG submitters to follow the rule of encoding quotes as quotes and apostrophes as apostrophes. (Note, an apostrophe is an apostrophe, not the number 27. 27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept). Or is there some situation in which apostrophes are paired? And if so, wouldn't they both be encoded as apostrophes?
On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

On Fri, January 27, 2012 12:57 pm, don kretz wrote:
How does it handle the other left one when they come in pairs?
They should never come in pairs if one has managed to convince PG submitters to follow the rule of encoding quotes as quotes and apostrophes as apostrophes. (Note, an apostrophe is an apostrophe, not the number 27. 27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept). Or is there some situation in which apostrophes are paired? And if so, wouldn't they both be encoded as apostrophes?
On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

On Fri, January 27, 2012 12:57 pm, don kretz wrote:
How does it handle the other left one when they come in pairs?
They should never come in pairs if one has managed to convince PG submitters to follow the rule of encoding quotes as quotes and apostrophes as apostrophes. (Note, an apostrophe is an apostrophe, not the number 27. 27 is the number used by some systems to represent apostrophes, but it represents only an encoding, not a concept). Or is there some situation in which apostrophes are paired? And if so, wouldn't they both be encoded as apostrophes?
On Fri, Jan 27, 2012 at 10:01 AM, James Adcock <jimad@msn.com> wrote:
It is not ambiguous and a short look into the unicode standard would tell you that right curly quote is the preferred glyph for apostrophe.
If you have managed to convince PG submitters to actually follow this preference, then you have accomplished something. If not, you are simply automagically compounding errors.
"Apostrophe," ASCII U+0027, a neutral, highly overloaded, and ambiguous encoding which has been used in encoding a wide variety of meanings since before I was born -- and I wasn't born yesterday -- and which could mean a lot of things in a submitted text.
Right Single Quotation Mark, UNICODE U+2019, an unambiguous encoding which can only mean right curly single quotation mark.
If indeed, as you seem to be suggesting, that you are "automagically" changing U+0027 to U+2019, then all you have accomplished is implementing yet-another naive algorithm for (incorrectly) changing straights to curlies.

I apologize, I was the victim of a rogue proxy server... I rebooted it, and we should be fine now. On Fri, January 27, 2012 1:33 pm, Lee Passey wrote:...

On 01/27/2012 08:57 PM, don kretz wrote:
How does it handle the other left one when they come in pairs?
This is how we do it now. This is very hard for any automatic quote-checker to scan: 'It's Danny fightin' 'ard for life', the Colour-Sergeant said. This is how we should do the master file. This would be very easy for a quote-checker to scan: ‘It's Danny fightin' 'ard for life’, the Colour-Sergeant said. After conversion, on the user's device, it will look like this: ‘It’s Danny fightin’ ’ard for life’, the Colour-Sergeant said. -- Marcello Perathoner webmaster@gutenberg.org

On 1/27/2012 3:34 PM, Marcello Perathoner wrote:
On 01/27/2012 08:57 PM, don kretz wrote:
How does it handle the other left one when they come in pairs?
[snip]
This is how we should do the master file. This would be very easy for a quote-checker to scan:
‘It's Danny fightin' 'ard for life’, the Colour-Sergeant said.
Agreed. I just thought that Mr. Kretz was talking about pairs of apostrophes, not just two apostrophes that coincidentally happen to appear near one another in a file. Those are easy to deal with because, after all, an apostrophe is an apostrophe is an apostrophe. Are there any cases in which apostrophes are actually /intended/ to come as a pair?
participants (7)
-
Bowerbird@aol.com
-
don kretz
-
James Adcock
-
Jim Adcock
-
Lee Passey
-
Marcello Perathoner
-
Paul Flo Williams