
I'm curious to see if your script can handle tables. That is our current biggest bugaboo when it comes to transforming to PG TXT format. Josh ----- Original Message ----- From: "Sebastien Blondeel" <blondeel@clipper.ens.fr> To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 05:28:39 +0100
Hello,
I hacked some scripts doing the following:
RTF -> XML
RTF: from Word, using a (very) simple stylesheet: just paragraphs, 3 title levels, footnotes, and italics Meta-information is in the properties of the document. My script can extract images too, if wanted.
XML: using a personal and simple DTD (embedded), probably easy to port to any more complete DTD, such as TEI
This is the hard part, and I am never quite sure it will not break in case the Word file is weird.
From that, I then did other (proof-of-concept) scripts to produce:
XML -> PG TXT XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks) XML -> valid HTML 4.01 (probably useless) XML -> XHTML 1.0 Strict with some CSS (embedded)
The programming is very defensive, so when all transforms finish I am confident enough the stuff is right.
You can find examples of those formats at http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/ (most of the books there don't have the meta-info properly set up, so don't worry too much about that).
My scripts also clean up small typography mistakes (they are specialized in French rules but can of course be taught any thing). They will be used to help give PG nicer formats from the ebooksgratuits team (until now their Word macros could only produce PG TXT, which is not very sexy to read for the end user).
Regards, _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d