Re: [gutvol-d] RST/PGTEI/etc

17 Feb 2012

      ...
It should not terminate with that warning when you set -w 9999. Because in
my math book 9999 is < 10000.
Don't know why it is terminating on your machine.  "9999 vs. 10000" is just
a coincidence -- the 10000 number is hardwired into the usage prompt.
Suggest you try a parameter like -w 1000 unless you know you have two *very*
different input texts. Again any large unmatching prefixes and suffixes such
as PG legalize, mismatched TOCs, "scholarly introductions" etc should be
removed first, and right now the code has a "known bug" where if the first
words and the last words of the two texts don't match it may not synchronize
(which I fix just by inserting dummy tokens such as "START" and "END") 

It's been a while since I've worked on this but I think it expects a word
dict "GutDicEN.txt" and expects it in more-or-less sort order. Slow if the
dict isn't more-or-less in sort order.
...
What is a "Levenshtein string match"??? even Google doesn't know.
...
There are routines that run faster and can adapt dynamically. No need to
Strange. Your copy of Google works different than my copy of Google which
gives:

http://en.wikipedia.org/wiki/Levenshtein_distance

Where in the case of word diff routines basically the string token is a
word, not a char.

trial-and-error.

Not sure what you mean by "trial and error" but the other routines I have
tried just crapped out when I tried them on "real world" tasks.
...
Line numbers in the output so that if I run this animal inside emacs or 
vi I can go from one mismatch to the next.
Give me a ref to your choice of diff output format and I will see if I can
help you if you are serious about *actually* wanting to use this.

Re: [gutvol-d] RST/PGTEI/etc

Jim Adcock