[gutvol-d] Re: Missing apostrophes in the generated HTML / ePub versions of Madame Bovary

8 May 2010


      On Sat, 8 May 2010, Marcello Perathoner wrote:
...
Andrew Sly wrote:
...
If you want some history, basically you can blame microsoft.
They developed their own character sets for use with Windows,
which were _close_ to already-established standards, but not
quite identical.
No, you cannot blame Microsoft.
This is one of the few cases were they did right: They registered their
character sets with IANA, and this makes them as standard as any other
character set, ISO or UNICODE or whatever.
Yes. I am aware that it is a registed charset. I have read before that
the general recomendation from microsoft was to simply label your
text as Latin-1 because it was close enough that there were no
important differences. However, I don't have a source for that,
so it is possible that it is merely unfounded microsoft-bashing.
...
The blame lies with the whitewasher who mislabeled the file as
ISO-8859-1 when it really is WINDOWS-1252.
Hmm... I could understand if this was some earlier PG text.
(From the time when the only encoding distinction made was
7-bit ascii, or "8-bit") But it was more recent, and should
have been caught at posting time.
...
Whatever. I fixed this by overriding the PG header in the database.
Somebody should check all books by http://www.ebooksgratuits.com or all
books with RTF files and see if they are correctly labelled.
Would it be possible to run some kind of automated check on all files
labelled ISO-8859-1, searching for characters in the 0x80 to 0x9F range?

--Andrew