Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20)

Marcello Perathoner <marcello@perathoner.de> wrote:
Lee Passey wrote:
It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels.
That is just what Apache thinks it is because it doesn't look inside the file before serving it. Apache can be made to serve the encoding based on the file extension. Lacking a definite extension it will serve the default which is iso-8859-1.
In this case I saved the file to my local file system before doing anything with it. Are you suggesting that Apache (your server) looked at the contents of the file it was serving and replace the <?xml ...> declaration to "<?xml version="1.0" encoding="utf-8" ?>" before serving it? Or are you suggesting that as it transfered the file it changed utf-8 encoded characters to Latin-1 encoding? (I've never seen that behavior in Apache before, but I could have overlooked something.) If I retrieved the file via FTP would it be different than if I retrieved it using HTTP?
I grabbed an arbitrary "tei.css" style sheet off the net, and added the line:
<?xml-stylesheet href="tei.css" type="text/css"?>
You can also include an XSL stylesheet which gives you far more power.
XSL isn't really a stylesheet, it is a scripting language for a transformational engine. XSL has many good uses, but applying styles to a document isn't one of them. Indeed, I've never figured out how to use XSL to style an XML file without having an existing Cascading Style Sheet that I could use for the actual styles.
But why do you want to look at the TEI file in the browser when there is an HTML file available?
Why ask why? Actually, I'm not interested in looking at the file at all; it's as boring as hell. What I _am_ interested in is exploring the use of TEI as an archive format, _and_ as a content delivery format. I think that enabling a TEI-XML file to be used by a browser directly, if it can be done without compromising its function as an archive format, is a worthwhile goal, and in many cases better than requiring some sort of XSL transformation before it can be viewed.

Lee Passey wrote:
In this case I saved the file to my local file system before doing anything with it.
Then I don't know. The file is correct utf-8. Did you tell your editor that it is an utf-8 file? -- Marcello Perathoner webmaster@gutenberg.org

Lee Passey wrote:
It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels.
Ok, I tried to see what grief you are talking about ... all the accented vowels I looked at are appearing correctly. Which ones are you having trouble with? (This is looking at the XML directly in Firefox) I thought everything in Latin-1 encoding would be the same under a UTF-8 encoding, but evidentally I'm mistaken there (which wouldn't be surprising, my encoding set knowledge is often shaky at best). Josh

Joshua wrote:
Lee Passey wrote:
It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels.
Ok, I tried to see what grief you are talking about ... all the accented vowels I looked at are appearing correctly. Which ones are you having trouble with? (This is looking at the XML directly in Firefox)
I thought everything in Latin-1 encoding would be the same under a UTF-8 encoding, but evidentally I'm mistaken there (which wouldn't be surprising, my encoding set knowledge is often shaky at best).
Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL: http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 ) that the "template" has the following DOCTYPE: <?xml version="1.0" encoding="iso-8859-1" ?> Why isn't it <?xml version="1.0" encoding="utf-8" ?> ? Is this the issue of what Lee observed, or is this a different issue? Jon

Jon Noring wrote:
Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL: http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 ) that the "template" has the following DOCTYPE:
<?xml version="1.0" encoding="iso-8859-1" ?>
Why isn't it
<?xml version="1.0" encoding="utf-8" ?>
Because most people will want to author their TEI files in iso-8859-1. If you want to use utf-8, just change the declaration. But you'll need an editor that groks utf-8. -- Marcello Perathoner webmaster@gutenberg.org
participants (4)
-
Jon Noring
-
Joshua Hutchinson
-
Lee Passey
-
Marcello Perathoner