Re: [gutvol-d] Double blind OCRing?

28 Sep 2012

...
Automated diff programs are good, but they are not infallible. When diffing
two files which are significantly different, they diff point on the two
files needs to be regularly sync'ed. For example, suppose you are using the
standard gnu diff program, and one file contains a section of text that the
other does not. The program cannot just go on diffing line for line, because
when the additional text is encountered every line thereafter will be
reported as different.
Depends on the diff program.  The one I created I created specifically for
the purpose of cross-diffing will correctly deal with not just line
miss-matches but with whole missing or entirely changed sections of text.
...
Because of this problem, all diff programs have the capability to
"look-ahead" and try to establish a point where the lines again become
identical. There will always be files that exceed the "look-ahead" 
capability, and thus cannot be diffed.
This statement is certainly not true, at least not the first part. Certainly
files that contain sections of millions of words mismatched in a row will
prove to be problematic.

Re: [gutvol-d] Double blind OCRing?

James Adcock