jim said:
>   my understanding was that you were trying to work from
>   the PG txt files – which do not retain the original hyphenations. 

right.  so those end-of-line hyphenates need to be restored.


>   Recovering original hyphenations should be in theory possible
>   too, but not work that I have looked at yet.

"in theory"?  you just take them from the o.c.r. at archive.org,
which is what you used to restore the linebreaks anyway, right?


>   The linebreak recovery algorithm I worked on was intended
>   to allow people at DP, for example, if they want to, to resubmit
>   some of the early PG works and run them through DP again.

ok, fair enough.  but in that case, your routines should have
done what d.p. requires of its proofers, which is to move the
second part of the end-of-line hyphenate to the previous line.
i _believe_ that in some cases, you moved the first part down...
(but i could be wrong on that, so do please let me know if i am.)


>   Without automatic recovery of linebreaks one has several days
>   of extremely tedious work reintroducing the original linebreaks. 

as i said above, i use the o.c.r. text from archive.org to restore them.
it's pretty straightforward, and automatic.  it didn't take long to code,
and it runs very fast.  but you're right; doing it manually is painful...


>   The other alternative for you is to leave healthy right margins
>   and leave your PDF’s “ragged right” [*very* ragged right!]

well, one object is to clone the pages of the printed book itself,
so ragged-right isn't really an option.

-bowerbird