>you rewrapped lines that weren't supposed to be rewrapped.
You make my point for me. When one relies on automagical
tools to try to recreate semantic information discarded by the PG TXT
representation, more or less often one ends up with something that looks like
sh*t – your word not mine.
When the results break one is told “Oh you did it wrong,
you should have done something else instead.”
Yes of course, but once one relies on human intervention to “fix”
the problem when a particular algorithm breaks, then one does not have an
automatic algorithm. Ultimately what one should do if one wants to “get
it right” is to abandon attempts at automagical tools which work sometimes
and end up looking like sh*t other times and instead take the PG TXT file, take
the original page scans, look at the page scans to figure out where the PG TXT
files gratuitously entered line breaks where the author didn’t intend
line breaks, and take them back out. After the gratuitous page breaks are
taken back out (the work of a few days – trust me on this!) then one can
either, if one has a machine, such as a teletype, incapable of reflow, run the
now gratuitous-line-break free TXT back through a simple unambiguous algorithm
to insert a line break at the appropriate point for your machine – at a
whitespace prior to char72 if you own a teletypewriter, at a whitespace prior
to char20 perhaps if you own a cellphone. Or better, if you have a more
modern machine, which really, I think most of us DO have, a machine capable of
calculating reflow itself aka “word wrap” then you just feed the
machine the TXT that doesn’t have the gratuitous line breaks and
everything works automagically. Assuming one is willing to live with
ragged right. Or tolerate slightly ugly word spacing on machines that
force right justify (sigh.) Better yet, we should ask our technologist
friends to include not only reflow but also automatic hyphenation routines in our
machines.
Is it too much, for example, to ask PG to provide the option to
the rare user who actually WANTS line breaks at char 72, or for that matter
actually wants line breaks at char 20, is it too much to ask PG to provide a
filter to insert such “gratuitous” line breaks? Consider: PG
*already* provides literally 40 different such filter programs to help
people with various strange obscure legacy machines.