
I believe the combined experience of many PG volunteers preparings texts for PG has shown that the best course is: "preserve what is used in the original text" We don't don't change old spellings of place names in English to "correct" them, so why do so in another language? Personally, I would leave the marks out of the 8-bit text, as the original characters cannot be reproduced using ISO-Latin-1. You may want to also consider making a unicode plain text file. See "Through the Mackenzie Basin: A Narrative of the Athabasca and Peace River Treaty Expedition of 1899" (http://www.gutenberg.org/etext/12569) for a similar example of the author rendering native north american proper names with accents over some letters. (in this case acute accents over consonants.) Thanks, Andrew On Fri, 14 Jan 2005, Phil Hitchcock wrote:
I am currently preparing e-text versions of "W H Sleeman's, Rambles and Recollections of an Indian Official". The book describes life and customs in India in the 1830's.
Many of the place names, personal names, and various other words have a dash - placed over an a, e, i, or u, to indicate a long vowel.
When I produce the 7-bit ASCII plain text, these marks will be missing; in the 8-bit ASCII version I am planning to use a circumflex accent ^ to replace the diacritical mark.
However in a HTML version I could use the circumflex accent, or I could use the Unicode series starting with Ā to give a vowel with a dash over it, thus reproducing the original text form. However I have seen some present day publications using the circumflex accent on Indian place names.
Thus, I am wondering what the Project Gutenberg preferred form would be for the diacritical mark in the HTML version.
Philip Hitchcock Hertfordshire, UK.