
Regarding the marking up issue, this is how I feel: PG TXT format is not meant to be read (it is ugly). It is meant to be "the" reference format, waiting for something spiffier (XML or the like). It is meant to be transformed in other formats, or viewed in nice reading tools (eg: PDA with proportional fonts, anti-aliasing, etc.). As such, typography has nothing to do in it: it is the backend's problem, that is to say it falls in the bailiwick of the program who will transform this basic interchange format into something else. (LaTeX does it automatically with babel packages for instance; XHTML could maybe do that with the right stylesheet --- then you won't have to worry about inserting all paragraph indents for example). When I type e-mails, even in French, I don't take the hassle to include semi- or full-length non-breakable spaces in front of ;:!?» and the like, or after «. (By the way, I guess in German quotes work like this: He said: »Hello« and not, like in French: He said: «Hello». I guess you code those quotes just as is in your raw text formats). E-mails are plain text in fixed-width font, not a printed book with nice typography. As long as you don't destroy information, you can afterwards translate those things properly respecting classical typography. I try to do that for the PDF backend in http://www.eleves.ens.fr/home/blondeel/PGDP/ebooksgratuits/ For instance, in a French text: * any "--" appearing in the beginning of a paragraph is a dialog dash that shold become "&endash; " or maybe "&emdash; " in HTML. * any other "--" is an em-dash that should become " &emdash; " in HTML (note the normal spaces: not unbreakable ones!) * maybe other rules that escape me now (number intervals?) On Thu, Jan 06, 2005 at 04:06:35PM -0800, Andrew Sly wrote:
However, I do see a problem. Any "simple" global search/replace such as that has it's risks. You cannot assume that every instance of "--" is an emdash.
People who perform such search and replaces are supposed to know what they are doing. If you want to distinguish between "--" appearing in the beginning of a paragraph or others, for instance, you will run a contextual search and replace. I understand some people don't know how to do that and don't want to learn how to do that. Then they will have to cope with the imperfect typography, and wait for PG to move to other formats: if/when some structured formats appear on PG, life will be much easier. For example you could go: User: Hey! show me book XXX in HTML format Server: there you are: [...] - Nice. Make the font bigger, the margins narrower, the titles bolder, etc. [*] Server (compiling this format on the fly): - there you are: [...] - Man! I like that book. Give it to me in PDF format. - there you are: [...] - Right. Give me both portrait format so I can print it, and landscape format with a bigger font so I can read it a little on the screen. - there you are: [...] [*] note: this you could do on your own, just changing the stylesheet of the XHTML file (see examples at the URL above). But the website/layout engine could do that for you. I can already do all of the above with the ebooksgratuits experiment I mentioned above (well, of course you would use drop-down menus and not natural language; I mean I could if I took the time to code it, but there is nothing difficult there: the proof of concept is out there. The only slight problem is to teach LaTeX how to cut words, but my program gives me the list of the words LaTeX couldn't cut and their severity and context, and makes it possible for me to teach it how to cut them). As for the case mentioned here, maybe it is a PP issue. Of course the HTML version should respect more the typography.
For instance, what would happen to the following (from Roughing it in the Bush, PG#4389):
"You were fortunate, C---, to escape," said a backwood settler,
This would fail the contextual search and replace. To implement the transformations I detail above, you could do this (sed syntax, but of course you would use an easier programming language): s/^--\([^-]\)/&endash; \1/ s/\([^-]\)--\([^-]\)/\1 &emdash; \2/g then you would check no "--" remain, you would check double spaces you may have introduced with the second transform (in case there were--wrongly--spaces around the "--" in the original text), etc.