
The only justification that _I_ can see for the PG 'books' to contain some form of pagination would be in a markup version with a hidden reference to the source page (scan image - if one exists). This would facilitate referring to the original book pages for simple corrections in the released versions. If it is _your_ intention to preserve the book, then the retention of pagination and other formatting would make perfectly good sense as well. I have followed this project (and this list) for around a decade. I have always been of the understanding that Project Gutenberg was about making available the contents of public domain books to as many people as possible. It has never seemed to me that a goal of the project was to create a re-printable facsimile of an existing edition, but rather to create the Project Gutenberg edition. I’ve always thought of us as the ancient scribes who spent their lives copying the contents of aging editions of books onto new materials so as to preserve the content for another few generations of time. The wonderful thing about the digital editions at Project Gutenberg is that they will last ‘forever.’ I have wondered for quite some time why the primary format for archiving (not necessarily submission or release) at PG does not include markup. It would be amazingly easy to generate the PG plain text version (for release) to exacting standards from a rather simply marked up original, but it is rather complex and potentially error prone to produce any format other than plain text from the plain text document. I have done the final plain-text formatting stage before and it is a slow and potentially human-error riddled process to manually format to PG standards (many of the standards I refer to are actually from DP rather than PG). I believe volunteers could be found to markup texts submitted (current and past) that lack such enhancements and to work towards getting the backend of PG more standardized so as to facilitate greater accessibility and distribution of the content. I realize that DP is already producing HTML versions, but that is not precisely what I am referring to and does not address the multitude of files without HTML versions and those that follow individual markup standards. And, I would never suggest that submission of a book be blocked or release be delayed by a contributor not having the ability or desire to markup their document. I am only suggesting the PG use a more re-usable formatting method for archiving and on-the-fly content generation than the current standard of somewhat-standard plain-text (and somewhat PG/DP standard HTML). There has been a lot of talk on here over the years of markup and preservation of information from the original book. Markup would enable the retention of as much (or as little) information beyond the content as the volunteers wished to provide without interfering in the creation of a Plain Vanilla text file. It also would enable the creation of one document with multiple editions nested within (or at least the difference between editions). The later being what I believe Michael just indicated was his ‘dream.’ Also, markup, when done with that purpose in mind, is human readable. <chapter title=”Chapter I” subtitle=”The quest for solitude”> <paragraph>Helen was <emphasis display=”italics” edition=”1845, Dover”>not</emphasis> in the mood for company and she resented the <errata edition=”1845,Dover” type=”replace;typographical error” details=” nkock”>knock< /errata> at the door….</paragraph> <paragraph><quote>Just go away!</quote> she yelled and then went back to reading her copy of <title type=”magazine” display=”italics;underline” edition=”1845, Dover;1865,NY Press”>The Mystery Guild Weekly</title>.</paragraph> …. <page number=”1” edition=”1845, Dover” /> <paragraph>She really had no idea why there were so many people coming <page number=”1” edition=”1865,NY Press” />to her house on such a day as <errata edition=”1845, Dover” type=”replace” details=”to-day”>today< /errata>. One would <errata edition=”1865,NY Press” type=”addition” details=”almost” /> think it was her birthday….</paragraph> …. </chapter> This is a rather poorly thought out and quickly done markup schema, but I believe it serves the point in that it is human readable. It would not be an enjoyable read, but it would be readable. And, long after all of us are gone, if this were the only copy of this document left on earth, someone could make sense of it enough to convert it to plain text (or another more readable format). They could also create a reproduction of the 1845 book or the 1865 book. Or, create an edition of one year or the other with footnotes or sidenotes/marginalia of the differences in the other edition(s). In the meantime, while we are still around, PG could publish the Plain Vanilla version (and any other versions they choose to make available). I proposed something similar to this many years ago over at DP, but there was little interest in it at the time. I’m not sure there is much interest in it now…. In any case, thank you to all the volunteers for all of the hard work over the years and all the books you have provided for my pleasure and for the pleasure of so many others! I’ve never felt as though you get thanks often enough from all of us users of the end results of your dedication. Thank you! Carel