On Apr 21, 2010, James Adcock <jimad@msn.com> wrote:

>It is simpler to fix the HTML than to fix the Epub, so why should the Epub
retain the line breaks?

Sorry, if you say that tidy is only being used to generate epubs not to
modify the posted HTML then fine. On one of my previous HTML submissions a
WW said he had run tidy on it. Obviously the intent is to allow future
DP'ers or PG'ers who have figured out a better scheme, TEI Lite or whatever
(hypothetical), to make another DP pass or solo on the effort by extracting
the already "corrected" txt matched against the original OCR rather than
having to start again "from scratch." And again pgdiff can extract
linebreak info given a txt which has lost linebreaks and an OCR that retains
them, but, its still cleaner and easier not to have lost them in the first
place.

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d