
Hi James, I do understand the the levenstein measure and actually do not think we need to discuss it caveats far as precision and sucessfulness. An interesting approach by using English and American versions. Yet, that makes pgdiff specific to one set of languages. On the other side. If you out the problem of the forwards, tocs, and indices et. al. you could simply try adding in a component that rewrites with the others spelling conventions. That I know is no trivial task. As far as my considering not using diff and just a simple comparison method which is linear, the problem of alignment does remain. I admit I have done the math or have an exact algorithm but it does seem to me that it would be polynominal and still far better than n^2. regards Keith. Am 19.03.2010 um 19:29 schrieb James Adcock:
Proofing is per se linear, has relatively few differences, and is aided by humans, and a new version is to be created and not a merge. The process is simple compare text A and B as long as they are equal and then gather the information as long as the differ, present the difference, offer possible changes, continue. Without much analysis one can see that this process is linear.
Agreed -- although again you run into problems when your assumptions break down. Pgdiff wasn't intended for these simply "change a couple letters within a line of text" problems. It was intended for problems of the nature of "I have two different editions of the text from two different continents one using English spellings and one using American spellings and having different linebreaks and different pagebreak and different intros and censorship and different indexes and I want to use one to help find scannos in the other." Yes it can be used for simpler tasks but if you have a simpler task you might be better off to figure out exactly what that task is and write a tool to match that task. Human edits within line tend to be char-by-char and you might be better off using a Levenshtein measure with the "token" set to be a char and the "string" set to be a line of text -- to give an obvious example -- since its not obvious to me how someone uses a mouse and a keyboard to make changes other than "insert a char" "delete a char" or "substitute a char" -- unless one uses cut and paste, in which case all assumptions are off again....
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d