Re: [gutvol-d] Encoding statement in HTML PG Header

14 Sep 2005

      Joshua Hutchinson wrote:
...
My reason for asking this is because currently the TEI->HTML
conversion doesn't list a character set encoding in the PGHeader.
Should it?  How should the automated system determine what to put
there if we have that line?
The encoding is already handled very well by the browser and we should 
not bother the user with things he does not need to know.

But with Unicode we face a problem completely different than the 
character set encoding problem: the problem of character set coverage.

With iso-8859-1 and its beggarly 256 characters we can be pretty sure 
the user has at least one font installed which contains all these 
characters. The browser will find this font and display the characters, 
even if running on a chinese PC.

With Unicode we can be sure that all the fonts the user has installed, 
taken as a whole, don't cover the whole Unicode character set.

There is no solution to this problem. If you use unicode characters in 
your text you are gambling on the user having an appropriate font installed.

The only hint we could give to the user is, if he can reasonably expect 
his browser to render this file correctly. As we cannot know which fonts 
the user has installed, we can just print a list of the unicode blocks 
used in the file, like this:

Unicode blocks: Basic Latin, Latin-1 Supplement, Bengali,
                 Greek and Coptic, General Punctuation

so a user who has no Bengali fonts will know some characters will not 
display.

I think this will create more confusion than it solves and opt for 
leaving any character set encoding line out of the header in non TXT files.

-- 
Marcello Perathoner
webmaster@gutenberg.org