>too bad all the p.g. e-texts are rewrapped...
that's gonna make re-proofing very difficult.
I don’t understand why it would be very difficult. I told this forum some years ago that I have written software that will take rewrapped e-texts and unwrap them back to match the original or reconstructed OCR, even if that OCR is very messy. Find an example of the reconstruction of the original line breaks from an extremely scanno txt file at:
http://freekindlebooks.org/Dev/Huck.txt
Where the text-part is PG 76, and the linebreaks-part are taken from the extremely-scanno file at AI adventureshuckle00twaiiala
The results looks like a mess only because adventureshuckle00twaiiala is a real mess of an OCR (like a lot at IA) – but the linebreaks are pretty much at the correct positions if you check it out (it looks messy mainly just that the IA OCR includes a tremendous amount of vacuous vertical whitespace) Also, the line lengths jump around – because the line lengths DO jump around in the original text, as the original text wraps text around “floating” images.
If I was going to do this “for real” I would probably make the effort to re-OCR the IA posting, since its usual to be able to get a much cleaner OCR easily compared to what IA posts.
This linebreak reconstruction took me literally about 10 minutes, complicated only by the fact that the start of the 76.txt is *extremely* unfaithful to the original text. (And in general 76 is pretty unfaithful to the original text.)
Granted, life would be more simple if PG were to request that submitters retain the original line break locations in the txt and html submissions, rather than asking people to rewrap the text files at 70 chars, and to run the html through tidy. But linebreak reconstruction isn’t *that* hard.
Not sure why you think you want to do this in the first place though? I had imaged linebreak reconstruction for the case where DP wants to take an old crusty PG book and run it all the back through their system again – perhaps skipping a round or two. But why would any of you want to do this?