
This refers to the standard PGheader information we include at the beginning of all of our documents. For instance: Title: The Rejuvenation of Aunt Mary Author: Anne Warner Release Date: May 6, 2005 [eBook #15775] Language: English Character set encoding: ISO-8859-1 *** The character set encoding line makes sense for text files. However, for HTML files it begins to make a little less sense. First of all, an HTML file usually contains an encoding line in the HTML header itself. <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> But, this information just refers to how the HTML file is encoding, not necessarily what character set is actually displayed in the browser. For instance, a HTML document encoded in ISO-8859-1 can still contain all sorts of UTF-8 characters. You just have to escape them out (xxx) to get the browser to display the UTF-8 character. So, in that case, if we put a character set encoding line in the PGHeader, which do we use? The file itself is ISO-8859-1 ... but the characters displayed in your browser include UTF-8. Or vice versa ... if you create a HTML doc encoded in UTF-8, but it contains nothing by ASCII characters, which do you say in the PGHeader? My reason for asking this is because currently the TEI->HTML conversion doesn't list a character set encoding in the PGHeader. Should it? How should the automated system determine what to put there if we have that line? I'm looking for opinions and hopefully a consensus can be reached. Josh