>it would've been best to retain the _end-line_hyphenates_ too
(or else justification is unworkable), as well as _pagebreaks_
(because one objective is to compare the text with the scans.)
I agree with you that you will have trouble with PDF unless you
maintain the original source hyphens, but my understanding was that you were
trying to work from the PG txt files – which do not retain the original
hyphenations. Recovering original hyphenations should be in theory
possible too, but not work that I have looked at yet. The linebreak
recovery algorithm I worked on was intended to allow people at DP, for example,
if they want to, to resubmit some of the early PG works and run them through DP
again. Without automatic recovery of linebreaks one has several days of
extremely tedious work reintroducing the original linebreaks.
The other alternative for you is to leave healthy right margins
and leave your PDF’s “ragged right” [*very* ragged right!]