
Hi, some books, like "Don Quijote" (http://www.gutenberg.org/etext/2000) have spurious break lines all over the text. From what I understood PG generates all the derived formats from the HTML, if there is one, or from the raw text format otherwise. In this case there is an HTML version, but it also contains the spurious break lines. My guess is that the HTML was automatically generated from the text, and the text breaks the lines at ~79 - 80 characters. Are there guidelines on how to format the raw text to make it more amenable for automatic conversion to other formats by the PG tools? Is it ok to reformat this text removing the spurious break lines in the raw text? Was the HTML automatically generated? or do I have to fix also the HTML? How can I check the results in other formats before sending it to PG? Also, are the conversion tools open source? Cheers, -- Joaquin Cuenca Abela