
"Bowerbird" == Bowerbird <Bowerbird@aol.com> writes:
Bowerbird> still waiting for carlo to demonstrate his output... Bowerbird> or document his procedure. or _anything_, really. I assume that you know wdiff format if you want to understand the details. If you don't, either read the manual or skip the details. Basically, the result of wdiff is the common text, in which the whitespace of the common parts is taken from the second argument, and variable parts are enclosed in [-...-] for the first file, {+...+} for the second file. The procedure is the following: assume that we have two files, file-pg.txt and file-tia.txt, the second having page separators. To simplify, assume that neither file contains the strings [- -] {+ +} that are used in wdiff format (if they do, wdiff has options to use different separators). execute the command wdiff file-pg.txt file-tia.txt > file-mix.txt Then take file-mix and do the following replacements: 1 - replace SEPARATOR with +}SEPARATOR{+ 2 - remove any string composed of (whitespace){+(anything except +})+} 3 - remove [- and -] The result is not perfect if file-pg.txt and file-tia.txt differ around the separator (e.g. if a word is split at page boundaries), in that case SEPARATOR is introduced at the beginning of the difference zone, and a few words may fall in the wrong page; since I never needed to do it systematically, I never cared to formalize this step. But a procedure might be easy to describe, using diff at the character level, to split one wdiff difference into two differences. To do this, dwdiff instead of wdiff might be useful. Similarly, if a difference region contains newlines, part of the PG newlines may survive. It is easy to recognize (a [-...-] or a {+...+} region contains newlines) and could be handled in the same way. Reintroducing end-of-line hyphenation might be possible too. Of course this requires to handle newlines as outlined above. Overall, my advice would be to use wdiff handling as a pre-processing step for a more sophisticated tool operating on small regions. wdiff handles very well large difference regions like PG licence without problems. In the specific case of PNP, the TIA version has a lot of small differences with PG. Comparing with the 1813 first edition apparently PG version is more faithful to that edition than to the 1833 edition. TIA only has the second volume of the 1813 edition. Carlo