Re: [gutvol-d] Re: German texts and the m-dash

Inka (I'm assuming that's who did the HTML version) must not have used any of GutCutter-style tools BilFlis over at DP created! ;) -- to — is not a requirement, as far as I know, but it is something that I see most people do. I see no problem with replacing -- with <space>—<space> in the HTML, though. It would be a very simply find/replace to do so, too. Josh ----- Original Message ----- From: "Karl Eichwalder" <ke@gnu.franken.de> To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Re: German texts and the m-dash Date: Thu, 06 Jan 2005 16:25:40 +0100
"Joshua Hutchinson" <joshua@hutchinson.net> writes:
In my experience, HTML files DO currently switch -- to
Under those circumstances something went wrong with http://www.gutenberg.org/dirs/1/4/3/4/14340/14340-h/14340-h.htm .
However, the text files use -- because the entity equivalent doesn't exist in 7bit ASCII.
That's okay.
I think I've seen this discussion before on DP forums. If I remember correctly, it was decided to stick to the xyz--xyz standard simply to avoid confusion and complication.
I'm not sure whether the German reading community will get used to it ;-)
-- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

In my experience, HTML files DO currently switch -- to
Under those circumstances something went wrong with http://www.gutenberg.org/dirs/1/4/3/4/14340/14340-h/14340-h.htm .
However, the text files use -- because the entity equivalent doesn't exist in 7bit ASCII.
That's okay.
I think I've seen this discussion before on DP forums. If I remember correctly, it was decided to stick to the xyz--xyz standard simply to avoid confusion and complication.
I'm not sure whether the German reading community will get used to it ;-)
Just to avoid to avoid complications ;-) a) \227 for em-dash is neither ascii nor iso-latin nor unicode: it is windows codepage. em-dash is — b) unicode has two slightly different characters, em-dash and horizontal bar, ― the latter is explicitly indicated in dialogues--that is where it is mostly used in german, french and italian, and other languages. It would make a lot of sense to use horizontal bar (that has some space around) and em-dash (without) where they are indicated. To forget en-dash – figure-dash ‒ etc. But, as I said, this a further level of complications (i.e. typographical precision) that is probably beyond our (present) reach. Carlo Traverso

On Thu, 6 Jan 2005, Joshua Hutchinson wrote:
Inka (I'm assuming that's who did the HTML version) must not have used any of GutCutter-style tools BilFlis over at DP created! ;)
-- to — is not a requirement, as far as I know, but it is something that I see most people do. I see no problem with replacing -- with <space>—<space> in the HTML, though. It would be a very simply find/replace to do so, too.
However, I do see a problem. Any "simple" global search/replace such as that has it's risks. You cannot assume that every instance of "--" is an emdash. For instance, what would happen to the following (from Roughing it in the Bush, PG#4389): "You were fortunate, C---, to escape," said a backwood settler, Andrew

For instance, what would happen to the following (from Roughing it in the Bush, PG#4389):
"You were fortunate, C---, to escape," said a backwood settler,
I never use three hyphens. In fact, I search for them and change them to either two or four. I'd have set set this example as four hyphens, then in the HTML (automatically) converted each pair into an "mdash;". Two em-dashes look like one continuous (double-em) dash in the browsers I use (IE and Firefox). Bill Flis

Regarding the marking up issue, this is how I feel: PG TXT format is not meant to be read (it is ugly). It is meant to be "the" reference format, waiting for something spiffier (XML or the like). It is meant to be transformed in other formats, or viewed in nice reading tools (eg: PDA with proportional fonts, anti-aliasing, etc.). As such, typography has nothing to do in it: it is the backend's problem, that is to say it falls in the bailiwick of the program who will transform this basic interchange format into something else. (LaTeX does it automatically with babel packages for instance; XHTML could maybe do that with the right stylesheet --- then you won't have to worry about inserting all paragraph indents for example). When I type e-mails, even in French, I don't take the hassle to include semi- or full-length non-breakable spaces in front of ;:!?» and the like, or after «. (By the way, I guess in German quotes work like this: He said: »Hello« and not, like in French: He said: «Hello». I guess you code those quotes just as is in your raw text formats). E-mails are plain text in fixed-width font, not a printed book with nice typography. As long as you don't destroy information, you can afterwards translate those things properly respecting classical typography. I try to do that for the PDF backend in http://www.eleves.ens.fr/home/blondeel/PGDP/ebooksgratuits/ For instance, in a French text: * any "--" appearing in the beginning of a paragraph is a dialog dash that shold become "&endash; " or maybe "&emdash; " in HTML. * any other "--" is an em-dash that should become " &emdash; " in HTML (note the normal spaces: not unbreakable ones!) * maybe other rules that escape me now (number intervals?) On Thu, Jan 06, 2005 at 04:06:35PM -0800, Andrew Sly wrote:
However, I do see a problem. Any "simple" global search/replace such as that has it's risks. You cannot assume that every instance of "--" is an emdash.
People who perform such search and replaces are supposed to know what they are doing. If you want to distinguish between "--" appearing in the beginning of a paragraph or others, for instance, you will run a contextual search and replace. I understand some people don't know how to do that and don't want to learn how to do that. Then they will have to cope with the imperfect typography, and wait for PG to move to other formats: if/when some structured formats appear on PG, life will be much easier. For example you could go: User: Hey! show me book XXX in HTML format Server: there you are: [...] - Nice. Make the font bigger, the margins narrower, the titles bolder, etc. [*] Server (compiling this format on the fly): - there you are: [...] - Man! I like that book. Give it to me in PDF format. - there you are: [...] - Right. Give me both portrait format so I can print it, and landscape format with a bigger font so I can read it a little on the screen. - there you are: [...] [*] note: this you could do on your own, just changing the stylesheet of the XHTML file (see examples at the URL above). But the website/layout engine could do that for you. I can already do all of the above with the ebooksgratuits experiment I mentioned above (well, of course you would use drop-down menus and not natural language; I mean I could if I took the time to code it, but there is nothing difficult there: the proof of concept is out there. The only slight problem is to teach LaTeX how to cut words, but my program gives me the list of the words LaTeX couldn't cut and their severity and context, and makes it possible for me to teach it how to cut them). As for the case mentioned here, maybe it is a PP issue. Of course the HTML version should respect more the typography.
For instance, what would happen to the following (from Roughing it in the Bush, PG#4389):
"You were fortunate, C---, to escape," said a backwood settler,
This would fail the contextual search and replace. To implement the transformations I detail above, you could do this (sed syntax, but of course you would use an easier programming language): s/^--\([^-]\)/&endash; \1/ s/\([^-]\)--\([^-]\)/\1 &emdash; \2/g then you would check no "--" remain, you would check double spaces you may have introduced with the second transform (in case there were--wrongly--spaces around the "--" in the original text), etc.

I don't see the usefulness of comparing our growth to Murphy's Law as even recent advances in computing is rendering this Law obsolete. Take my world of geological visualization, for instance. As explained by one of the specialists at Landmark Graphics, "With Linux 64-bit, you can have an unlimited volume of RAM on the system. The historical limit for the 32-bit system was around 2GB. Now, we're seeing systems that have 16GB of memory. No longer does compute power double every 18 months at the same price ... when things scale by 10 times, it's no longer just faster - you're in a different world." Granted, my system is not going to run at full spec speed at any given time owing to bandwidth quirks, etc., but it sure outdoes Murphy's Law nowadays. If we need something to which to compare our growth rates, we should define our own new baseline curve and see how our growth proceeds from there. Since I personally don't know how to create one of these benchmarks, I will stop here and let someone else recommend. Maitri __________________________________ Do you Yahoo!? All your favorites on one personal page � Try My Yahoo! http://my.yahoo.com
participants (6)
-
Andrew Sly
-
Carlo Traverso
-
Joshua Hutchinson
-
maitri venkat-ramani
-
Sebastien Blondeel
-
William Flis