Re: [gutvol-d] XML version of some books of PG (and other formats)

3 Dec 2004


      I'm curious to see if your script can handle tables.  That is our current biggest bugaboo when it comes to transforming to PG TXT format.

Josh


----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 05:28:39 +0100
...
Hello,
I hacked some scripts doing the following:
RTF -> XML
RTF: from Word, using a (very) simple stylesheet: just
   paragraphs, 3 title levels, footnotes, and italics
   Meta-information is in the properties of the document.
   My script can extract images too, if wanted.
XML: using a personal and simple DTD (embedded), probably easy to port
   to any more complete DTD, such as TEI
This is the hard part, and I am never quite sure it will not break in
case the Word file is weird.
...
From that, I then did other (proof-of-concept) scripts to produce:
XML -> PG TXT
XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks)
XML -> valid HTML 4.01 (probably useless)
XML -> XHTML 1.0 Strict with some CSS (embedded)
The programming is very defensive, so when all transforms finish I am
confident enough the stuff is right.
You can find examples of those formats at
http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/
(most of the books there don't have the meta-info properly set up,
  so don't worry too much about that).
My scripts also clean up small typography mistakes (they are specialized
in French rules but can of course be taught any thing). They will be
used to help give PG nicer formats from the ebooksgratuits team (until
now their Word macros could only produce PG TXT, which is not very sexy
to read for the end user).
Regards,
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

Re: [gutvol-d] XML version of some books of PG (and other formats)

Joshua Hutchinson