XML version of some books of PG (and other formats)

Hello, I hacked some scripts doing the following: RTF -> XML RTF: from Word, using a (very) simple stylesheet: just paragraphs, 3 title levels, footnotes, and italics Meta-information is in the properties of the document. My script can extract images too, if wanted. XML: using a personal and simple DTD (embedded), probably easy to port to any more complete DTD, such as TEI This is the hard part, and I am never quite sure it will not break in case the Word file is weird.
From that, I then did other (proof-of-concept) scripts to produce:
XML -> PG TXT XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks) XML -> valid HTML 4.01 (probably useless) XML -> XHTML 1.0 Strict with some CSS (embedded) The programming is very defensive, so when all transforms finish I am confident enough the stuff is right. You can find examples of those formats at http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/ (most of the books there don't have the meta-info properly set up, so don't worry too much about that). My scripts also clean up small typography mistakes (they are specialized in French rules but can of course be taught any thing). They will be used to help give PG nicer formats from the ebooksgratuits team (until now their Word macros could only produce PG TXT, which is not very sexy to read for the end user). Regards,
participants (1)
-
Sebastien Blondeel