jim said:
> I’ve put up a new copy of the tool pgdiff
> that contains an option “-smarted” which
> outputs the text in a form similar to what
> I think you want BB for your “smart editor” tool.
i'm sure some of your users will enjoy that option, jim...
as you might expect, i'll likely stick with my own tools...
but yes, this new option might allow me to perfect the
tool that i've built in support of your pgdiff tool, i hope.
> Your suggestions work in simple cases but I think
> you will find that they fail relatively spectacularly
> on difficult cases, such as when performing versioning
> across different editions.
well, if i'm gonna fail, please let me fail "spectacularly".
i compare different editions using a different technique;
essentially i do a _paragraph-level_ comparison for that.
it's easy enough to unwrap texts to the paragraph level.
indeed, i do paragraph-level analyses in my comparisons
all the time. that's how i catch the paragraphing glitches.
(it's also necessary to work at the paragraph level when
you're fixing spacey-quotes, as i have mentioned before.)
> I also updated the example output file “BBoutput.txt”
> to show the new output.
great. i'll go get it this afternoon...
> Again, the problem is basically the domain
> you are interested in working on and the domain
> I am interested in working on is very different.
actually, they're not. but that's another question for
another day. here today's issue is finding and fixing
errors by comparing two versions which are similar...
> You want a tool that catches small changes
> within a line of text, and I want a tool that catches
> large changes within a file.
two rejoinders.
first, my tools are capable of finding "large differences"
if they are what exist. but, like i just said, that arena is
not of much particular interest here on the p.g. listserve.
second, i have -- without knowing it at first -- worked
on doing comparisons between what turned out to be
different editions of a book. and most of the changes
were not "large" ones, but rather "small" ones, notably
punctuation variations reflecting different house "styles".
i discussed this particular comparison at _great_ length
over on the d.p. forums, under a thread with a title like
"a revolutionary method of proofing", if you're interested.
> It is easy to hypothesize what the “answer” is
> if you are not the one doing the work.
i agree. that's why i suggested we work on actual data.
i find it best if i don't bias my research by selecting the
data that i work on, so i work on other people's stuff,
which is why i choose that book from rfrank. however,
if you want to share some data on a book of your own,
one you're working on, i would be happy to look at it...
> But if you are the one doing the work you rapidly find
> “oops that idea doesn’t work after all!”
you know, i hear a lot of people saying "that doesn't work".
but usually, they're being bamboozled by some _small_
issue that can be overcome quite easily if they just try...
a good example of that was yesterday, when juliet said
"your renaming solution won't work because pagenumbers
are often misrecognized." well, yeah, that happens, but
that particular "obstacle" can be hurdled with little effort.
so i invite you to bring any "doesn't work" problems to me...
i like the challenge of seeing if i can make it work regardless.
-bowerbird