
It should not terminate with that warning when you set -w 9999. Because in my math book 9999 is < 10000.
Don't know why it is terminating on your machine. "9999 vs. 10000" is just a coincidence -- the 10000 number is hardwired into the usage prompt. Suggest you try a parameter like -w 1000 unless you know you have two *very* different input texts. Again any large unmatching prefixes and suffixes such as PG legalize, mismatched TOCs, "scholarly introductions" etc should be removed first, and right now the code has a "known bug" where if the first words and the last words of the two texts don't match it may not synchronize (which I fix just by inserting dummy tokens such as "START" and "END") It's been a while since I've worked on this but I think it expects a word dict "GutDicEN.txt" and expects it in more-or-less sort order. Slow if the dict isn't more-or-less in sort order.
What is a "Levenshtein string match"??? even Google doesn't know.
There are routines that run faster and can adapt dynamically. No need to
Strange. Your copy of Google works different than my copy of Google which gives: http://en.wikipedia.org/wiki/Levenshtein_distance Where in the case of word diff routines basically the string token is a word, not a char. trial-and-error. Not sure what you mean by "trial and error" but the other routines I have tried just crapped out when I tried them on "real world" tasks.
Line numbers in the output so that if I run this animal inside emacs or vi I can go from one mismatch to the next.
Give me a ref to your choice of diff output format and I will see if I can help you if you are serious about *actually* wanting to use this.