Re: Missing apostrophes in the generated HTML / ePub versions of Madame Bovary

andrew said:
If you want some history, basically you can blame microsoft.
what kind of "history" is that? c'mon, let's go _back_. let's blame the serpent in the garden of eden who tempted eve to take a bite of the forbidden fruit... get real. and stop making excuses. now that marcello knows that the encoding declaration can't be trusted, he can _improve_ his converter script by actually doing some reality-checks on the text itself, to make sure it actually is the encoding it claims to be. and apostrophes would be an _excellent_ way to start... or better yet, run such a checker-program against the entire library, and ferret out the incorrect declarations, so the converter script _can_ know the encoding for sure. -bowerbird

On Sat, 8 May 2010 Bowerbird@aol.com wrote:
andrew said:
If you want some history, basically you can blame microsoft.
what kind of "history" is that?
c'mon, let's go _back_.
let's blame the serpent in the garden of eden who tempted eve to take a bite of the forbidden fruit...
get real. and stop making excuses.
Well, everyone is entitled to their own opinions. My own opinion is still that the way microsoft introduced and have used their character sets has created unnecessary difficulty for many people. I'm sorry if you perceive that as making an excuse.
now that marcello knows that the encoding declaration can't be trusted, he can _improve_ his converter script by actually doing some reality-checks on the text itself, to make sure it actually is the encoding it claims to be. and apostrophes would be an _excellent_ way to start...
or better yet, run such a checker-program against the entire library, and ferret out the incorrect declarations, so the converter script _can_ know the encoding for sure.
I don't think this is any new revelation. I knew this was a problem years ago, and brought up the possibility of trying to run some kind of scan to identify texts that were possibly mislabelled. I got zero interest then. And you can't always tell with an automated check. For instance if you have a hungarian text encoded in ISO-8859-2, and it is incorrectly labelled as ISO-8859-1, an automated check won't be able to tell you that anything is wrong. But I agree it would certainly be useful to identify any texts that use byte values that should not be occuring in the stated character encoding. --Andrew
participants (2)
-
Andrew Sly
-
Bowerbird@aol.com