On quote-like marks...

30 Nov 2004

      Regarding the recent discussion about ASCII and the single/double
quote marks (and what to use), I have my two cents to add (and those
here who are much more expert at character sets and Unicode than I am
will undoubtely be able to add to this.)

The situation regarding single and double quote-like marks is even
more complicated than what it has been presented so far. It has an
impact on the future expanded use of PG texts as envisioned by Michael
Hart and others, such as text-to-speech and language conversion. So I
believe it needs to be dealt with in a more standardized-fashion (that
is, don't simply use the straight keyboard ' and " for everything
under the sun.)

Quote-like marks are used for multiple purposes in texts -- especially
single quote-like marks. And then there are the "curly" types of marks
used in typographical presentation.

Here's a (probably) partial list of their multiple uses:

1) For marking up quotations (other conventions are also used)
2) Word contractions (e.g., "we're" for "we are")
3) Possessives ("the Emperor's crown")
4) Non-breaking character modifiers (see below)
5) Minutes and second of time and arc. (50d3'25")
6) Feet and inches unit indicator (She is 5'7" tall)
7) Other mathematical symbol and unit measurement uses.

Item (4) is particularly interesting since I'm working on cleaning up
Burton's "1001 Arabian Nights Tales", and in it there are many Arabic
names where, when Burton converted to Latin script, single quote-like
marks were inserted to indicate a type of non-breaking character
modifier for pronounciation purposes. For example: Ja'afar. This
semantically differs from the apostrophes used for contractions/
possessives -- or at least semantically different enough (imho) that
warrants differentiation in character encoding/entities.

In the XML markup of the Arabian Nights, I've chosen to use the
following Unicode character conventions to keep everything straight.
It's not what I necessarily propose PG/DP do, but it indicates one
possible approach. Since at present I do not enclose quotations in <q>
(for example), I keep in the quotation marks (double and single) to
identify quotations. In the Arabian Nights I find some odd quotation
passages, a couple of which start in the middle of one paragraph and
end in the middle of another paragraph later within a story, so adding
<q>...</q> would result in non-well-formed XML (I could use the "mile
marker" approach as defined in TEI, but for the Arabian Nights have
chosen not to.)

1) For quotations using double quote marks, I use the Unicode
   left-double quote mark for the beginning, and the right-double
   quote mark for the ending: “ and ”, respectively.

   (The "curly" quotes -- for those who don't like curly quotes for
   reading, it is trivial to convert them to straight keyboard quote
   marks, but going the other way is more difficult to do.)

2) For quotations using single quote marks, I use the Unicode
   left-single quote mark for the beginning, and the right-single
   quote mark for the ending: ‘ and ’, respectively.

3) For the non-breaking character modifier as described above, e.g.
   for "Ja'afar", I use the Unicode character specific for this
   purpose: ʼ

4) For word contractions and possessives I use the ordinary lower-
   ASCII single straight quote mark: '  (For later presentational
   purposes this character can always be converted to the right-single
   "curly" quote mark.)

For use of ' and " for minutes and seconds of arc, and feet/inches,
there are special Unicode code points for these (I don't see this
usage in the Arabian Nights footnotes, but maybe I'll encounter it
somewhere, not having finished the 5000+ footnotes.)

If one is working with plain text files (not XML), the above Unicode
characters can be encoded at the bit-level using UTF-8 or UTF-16
encoding.

Jon Noring

Jon Noring

Jeroen Hellingman

tags

participants (2)