
On 02/17/2012 06:20 PM, Jim Adcock wrote:
And it can't do proper math:
$ ./a.out -w 9999 ../publish/76/76.rst ~/Documents/76-test.rst Setting a -w parameter larger than 10,000 results in excessive run times $
Not sure what "it can't do proper math" means.
It should not terminate with that warning when you set -w 9999. Because in my math book 9999 is < 10000.
This algorithm like all Levenshtein string match is n^2 in the maximum distance of string mismatches you want to handle.
What is a "Levenshtein string match"??? even Google doesn't know.
If you have less than 10,000 words in a row different between your two documents, then set a smaller maximum mismatch distance, such as 1,000 is very generous for most documents sets. If the routines you are used to run faster than this it is because they crap out when the input files sets aren't closely similar.
There are routines that run faster and can adapt dynamically. No need to trial-and-error.
4.
What do I do with this mess? There are not even line numbers in this mess.
I'm not sure what you want to do with this software, so I can't guess how to help you.
Line numbers in the output so that if I run this animal inside emacs or vi I can go from one mismatch to the next. -- Marcello Perathoner webmaster@gutenberg.org