
Hi Everybody, I will step in here for a moment. As Bowerbird has mentioned this discussion is as old as PG itself. The problems are: 1) Plain Vanilla Texts can not reproduce books (It is not meant, too) 2) PG does NOT have a comprehensive format for reproducing books. 3) PG has not evolved with mopdern computer technology. 4) Ecerybody wants thier pet formats for reading. 5) PG does not have a consolidated following willing to build the resources needed to solve the above. There are many various reason for the above problems. Yes, there ARE and have been efforts to solve the above. Yet, none of these have fruited much or have been able to satisfy needs of all its contributors or users. So what is needed: 1) A single modular and extensible format for encoding the books a) the structures in the book (text) need to be represented b) it does not presume a particular output format c) does not care about the size of files d) does not need to be very readable easily 2) a parser for creating output formats a ) use all information to create the best possible output for a particular format 3) an editor a) display the book b) allow for changes in the representation of the book c) must be modular and extensible 4) a parser for creating the representation of the book in the format from scans a) must be modular and extensible b) must be multi-pass c) flags possible conflicts with the format d) intelligent to do most markup by itself e) intelligent to correct common errors by itself 5) parsers for converting older formats a) all of 4) b) does not expect particular information c) allows for presets injorder to same time and desirable representation. 6) a proofing workflow So what do we have. We need a a format that is not based on an existing format, is modualr and extensible. Either we start from scratch or use a generic format. SGML or XML come to mind. We can then put in waht we want and need, have a well structured format, can extend it easily and it is modular. Plus, XML can handle all kind of information an data. Yes, we have to reinvent the wheel for markup, but we want a representation that contains as much information as possible. The question would be how much is needed. At least the markup will be a layout format. It should only take about a month to create such a format. The other parts will take a little longer. The important thing is everything has to be centered around the representation format and not the output. The output is handle by parsers. Where a particular output format can handle or represent a particular feature can be a concern of the PG internal representation. The developers of the output format can converted it to what ever the seem fittest. regards Keith.