
Don> What I find remarkable is that after 2 decades anyone would expect the
Project Gutenberg old guard to do anything other than the same thing they've been doing.
Greg>....Which is to leave such decisions to the eBook's submitter(s). Again, I have written software that would allow one to back-align PG works to "the original text" even when they are not "identical" texts, and can reintroduce page numbers and "original" line breaks. It's in a crude state right now, because no one has actually expressed an interest. My intent was that it could be used by DP to reprocess old crufty PG files back through their system (which it could be used for) if they wanted to [so that no one at DP really has ANY excuse to complain about independently produced books] or it could be used by someone wanting to back-submit to archive.org Or it could be used to pursue "more scholarly" versions. The software "works" by taking one "polished" PG text and one "unpolished" say raw OCR, Levenshtein matching them on word tokens and then clones the formatting whitespace from the one to the other. It can also clone over the page numbers. In general, obviously, if you want to say produce a "scholarly" edition from a PG text you're going to have to re-proof your book after performing such back matching. My software can help with that too.