Re: [gutvol-d] UTF-8 TXT (was Producing epub ready HTML)

25 Jan 2012

      On Jan 25, 2012, at 9:52 AM, Carlo Traverso wrote:
...
I believe that the problem in handling UTF-8 submissions is
unitame, that is the tool that the WWers use to recode UTF-8 to
iso-Latin-1. It cannot handle simple things like bullets and greek,
but it would be very easy to extend it to be able to handle these
characters and more (I did).
Unitame is part of it. But PG wants to include a plain ASCII file.
Look at the first paragraph of "The Black Star" (etext 35833).

It started in UTF-8 as 35833-0.txt:
  They poured through the man-made cañons
then with unitame into Latin-1 as 35833-8.txt:
  They poured through the man-made cañons
but the ASCII version of 35833.txt has
  They poured through the man-made canons
I would write that in the ASCII file as
  They poured through the man-made canyons

There are similar special cases for "oo" in a word and others.
Is there a tool to catch these special cases that the WWers
could use if they are given a Latin-1 file? It's not a discussion
of should a UTF-8 file alone be sufficient--that's one the
WWers, Marcello, and Greg should agree upon.
Right now, the ASCII version is in the mix, and unitame
alone isn't enough to get it done. Anybody up for
writing a latin1tame program?

--Roger

Re: [gutvol-d] UTF-8 TXT (was Producing epub ready HTML)

Roger Frank