
jim said:
I decline to attach any verbiage at all. I tell you I wrote it and you can use it any way you like -- at your own risk and amusement, obviously.
except that some of that "verbiage" was people asking just how exactly your program differs from one that they've been using all along. don't you wanna tell 'em? and as for me, perhaps you noticed i congratulated you for programming a tool. was that just "verbiage" to you? in addition, i will analyze any new tool, to check how well it performs the job for which it is intended. it's fine if you don't want to discuss it, but such a review is not "verbiage". it's necessary to take an objective look at our tools to see if they do the job, how they can do it better, and so on... you specifically said your tool helps in 3 areas: 1. line-break recovery 2. error-flagging 3. versioning you even said:
my new tool has several tricks that haven’t been seen before
(if anything has been "verbiage" in this thread, it's that!) so, at the end of this post, i'll begin to look at those 3 areas.
If you need to get more serious than that contact me by email and we can talk about it.
imagine the d.p. people had told you to make your complaints "via e-mail". i'd venture a guess that you would laugh at that... ***
On Vim I type: :/[{|}]/ Which highlights the edits and takes me to the next set of edits
but that selects both the options, and the surrounding characters. that's not really what you want -- what _most_people_ would want. and it involves typing. either typing or a lot of delicate deleting. both of which increase the probability that errors are introduced.
When you are versioning it is frequently not as simple as “choose A” or “choose B” but often a mix of both that you have to edit.
i'm sure i know the reality much better than you do, jim, because i've actually _done_ this resolution job, for lots and lots of books. but maybe rather than schooling me personally, you've said this for the benefit of the lurkers who might not have thought about it very much, if at all. (and that's an entirely appropriate thing to do.) but if we're going to enlighten them, let's do it properly, ok? your word "frequently" is simply (but completely) out of place. in the vast majority of cases (96%) where there is a difference between the two versions, _one_ of the versions is _correct_... there _are_ some cases where both are incorrect, meaning that you need to do some editing, but such cases are relatively rare. in the last book for which i did a comparison, gardner's text, there were 159 differences. there were only _3_ cases where _both_ versions were incorrect. so yes, it happens, but rarely.
And I like seeing each next to each other in context to help figure out what the “correct” editing moves are.
oh yeah, the context is _crucial_. but i'm not sure that your _display_ is the optimal one... it takes a lot of visual parsing to figure out a diff like this:
no way. There { warn't | wam't } a window to it big enough
personally, i find this display _much_ easier to understand:
no way. There warn't a window to it big enough no way. There wam't a window to it big enough ================^^============================
(i hope the monospaced font came through. if so, you'll see the "^^" markers line up with the diff.) and i believe most users would agree that this display is better. but, you know, if some users like _your_ display better, _fine!_ :+) oh, and one more note on "context". sometimes it can fool you. the choice that looks right might not be what was in the book... that's why it's vitally important that your tool show you the scan. otherwise, you're doing your edits blind...
PS: You criticize me for doing that which the creator of wdiff said he would do if only he had the gumption.
you'll need to provide a little more information to be understood.
How do you use wdiff to recover lost linebreaks?
i don't use wdiff for that. i wrote my own program. i asked carlo to explain how _he_ does it, but he never answered. i found it humorous he was willing to come out to challenge you, but isn't willing to come out when he is challenged... *** anyway, in order to "kick the tires" on your pgdiff program, jim, i'll set up some files that we can compare. (real books, real files, and not of my own choosing, either, but from rfrank's test-site.) i'll run the files when i next find myself around a p.c. machine... or, if you feel like it, jim, you can run them and post your output. once i have some real output to look at, i'll be able to do a much more thorough review of this new tool. while you're waiting for that, though, here's a screenshot of a tool that i wrote that makes it easier to work with jim's output.
basically, when it finds a line with a diff in it, it presents the options to the user, who can then click a button to choose one, or enter a number -- 1 or 2 -- to activate the appropriate button. in the case where editing is needed, either option can be edited before the button is clicked to select it. the "stop loop" button will stop the loop that presents the next diff display; otherwise, the app loops through the entire file, jumping to the next diff. so, you see jim, i'm really trying to _help_ you in your quest here. -bowerbird

Am 14.03.2010 um 23:58 schrieb Bowerbird@aol.com:
jim said:
I decline to attach any verbiage at all. I tell you I wrote it and you can use it any way you like -- at your own risk and amusement, obviously.
[snip, snip]
your word "frequently" is simply (but completely) out of place.
in the vast majority of cases (96%) where there is a difference between the two versions, _one_ of the versions is _correct_... there _are_ some cases where both are incorrect, meaning that you need to do some editing, but such cases are relatively rare.
in the last book for which i did a comparison, gardner's text, there were 159 differences. there were only _3_ cases where _both_ versions were incorrect. so yes, it happens, but rarely.
True enough. Yet, the arguement stands. At least in my opinion. The trivial cases are easy to handle, yet it is always the RARE cases where tools can shine and set themselves apart from the rest.
And I like seeing each next to each other in context to help figure out what the “correct” editing moves are.
oh yeah, the context is _crucial_.
but i'm not sure that your _display_ is the optimal one... it takes a lot of visual parsing to figure out a diff like this:
no way. There { warn't | wam't } a window to it big enough
personally, i find this display _much_ easier to understand:
no way. There warn't a window to it big enough no way. There wam't a window to it big enough ================^^============================
(i hope the monospaced font came through. if so, you'll see the "^^" markers line up with the diff.)
and i believe most users would agree that this display is better.
but, you know, if some users like _your_ display better, _fine!_ :+)
Actually, both methods are kind of primitive from a Human Interface standpoint. a better way would be having two windows containing two or more lines above and below the diff and marking each. If you ever work with critical editions you will understand the cavet of this method. The changes can then be made in a third. All can be enhanced with colors and other neat features.
oh, and one more note on "context". sometimes it can fool you. the choice that looks right might not be what was in the book... that's why it's vitally important that your tool show you the scan. otherwise, you're doing your edits blind...
Very true. regards Keith. P.S. There will always more than one way to skin a cat!

except that some of that "verbiage" was people asking just how exactly your program differs from one that they've been using all along. don't you wanna tell 'em?
You are attacking my reply re attaching licensing terms by attaching it to unrelated discussions.
On Vim I type: :/[{|}]/ Which highlights the edits and takes me to the next set of edits
but that selects both the options, and the surrounding characters.
that's not really what you want -- what _most_people_ would want.
and it involves typing. either typing or a lot of delicate deleting. both of which increase the probability that errors are introduced.
Again, you are assuming the problem presupposes a solution which is one of "Choose A" or "Choose B". If you use the tool on other than trivial problems you will find out that life is not that simple, and that frequently both A and B have some degree of errors that need to be corrected and/or merged to get you where you want to go. If one wanted to make a graphical tool to do this you would not only need the "Choose A" and "Choose B" options but "Edit in Context while displaying a copy of the original scanned page" and if one wants to make that kind of tool one would be better off to put the time and effort into figuring out a tool to display a scanned page a bit-mapped line at a time comparing to the OCR text as opposed to the DP current approach of displaying a bit-mapped page at a time compared to a OCR page at a time. And then one would also have to tackle the problem of how one wants to deal with the portability issues of the differing graphics systems on different people's computers. And one would have to build in an editing capability on par with the non-integrated editors that people currently choose to use and/or offer emulation of those editors in your editor offering. These WOULD be good issues to tackle, I just don't feel like I am the right person to tackle these problems. In practice, using pgdiff with Vim I find personally to be MUCH easier, less painful, and more productive than the DP approach, which is why I offer it for people to choose from. You still need to compare to the page scans.
in the vast majority of cases (96%) where there is a difference between the two versions, _one_ of the versions is _correct_...
This is not my experience, but in any case it should be obvious that the results are HIGHLY dependent on what kind of texts and OCRs you are working on.
but, you know, if some users like _your_ display better, _fine!_ :+)
More importantly, since I post my code and it is reasonably portable without a lot of rigmarole and without stack hacks like wdiff people can edit it and put it into their choice of display or other code.
you'll need to provide a little more information to be understood.
Read the wdiff documentation and you will see the author admits he would have written a stand-alone tool that doesn't depend on diff if he could figure out the algorithm.
...
so, you see jim, i'm really trying to _help_ you in your quest here.
Thank you. Post a portable version or one compiled for windows and I will tell you how it works for me in practice. PS: doesn't really help me with *MY* quest since I have the tools *I* need to do my job the way I want to do it, but granted perhaps other people would be happier with the GUI approach you are suggesting. Since I post the source code they can apply my work however they want to.
participants (3)
-
Bowerbird@aol.com
-
Jim Adcock
-
Keith J. Schultz