
jim said:
To help clarify what I am talking about I enclose below an except of the output of this tool
personally, i find that output to be obtuse and hard to read. and editing it would be very problematic, and error-ridden. giving people a tool that would present the choices to them, and let them click a button for the correct one, would make this far more easier to work with. *** jim said:
I have created a new command line tool “pgdiff”
good for you, jim! i assume you wrapped the "wdiff" routine in an .exe? that'll make it easier to use for normal windows users.
In this regards it is similar to “worddiff”, as opposed to “diff” which is the approach BB has been talking about, which compares on a per-line basis.
well, i use "diff" as a generic term. whether you use "diff" or "wdiff" depends largely on whether the lines are broken in a similar way, or have been rewrapped. i usually find it's worthwhile to fix the linebreaks so they are identical in the files, and match the p-book. that's because to resolve many of these differences, you have to look at the actual page, and that job is infinitely easier if your linebreaks match the page...
But my new tool has several tricks that haven’t been seen before:
um, ok...
It can be used with two different versions or editions of the text as long as there are not really long differences
ok, but that's something that's been "seen before"...
This means it can also be used for “versioning” – for example using a copy of a PG text from one version or edition of a text to help fix and create a text from a different version or edition of the text.
i'm not sure i understand what you're talking about here. if there are differences, how do you know if the differences are edition differences or o.c.r. differences? you'd have to refer to the page-scans for one version or the other, right?
It can also be used to recover linebreak information, where linebreak information has been lost, for example to take an older PG text and recover linebreak information in order to allow, for example, the resubmission of that PG text back to DP for a clean-up pass.
again, not something that hasn't been seen before... but i'd love to see this in action. carlo has _posted_ that people could use wdiff to do this chore automatically, but when asked to explain the procedure, he failed to follow up.
In normal mode when if finds an mismatch it outputs the mismatch like this { it’ll | it’11 } within the body of the text so that given a regex compatible editor it is very quick to search for and fix the errors found.
i'd really like to learn the reg-ex that makes this "very quick". i assume you'd search for the first half of the pair, and erase it if it's incorrect. then you'd do the same for the second half. then you'd go back and globally remove the excess characters. but i'd sure like to see that in action. and i don't think it would be very fast. or feel very easy. especially when -- for an error like '11 -- a global change within each of the files would end up being more efficient. it's also the case that, as i mentioned up above, you _need_ to have the scan available for viewing to resolve some diffs, so the ability of the tool to present those scans is _crucial_.
I find that finding differences on a word basis rather than a line basis makes it quicker and easier to fix the errors
if you've looked at the diffs i've presented, the _indicator_line_ narrows your focus down to a single word (if that's the diff), or even a single _character_ (like a comma, if that's the diff). it's just showing you the entire line so you have the _context_, and so you can _find_that_line_ more easily on the page-scan.
Source and a compiled windows version at
i'll take a look, as soon as i happen to be around a windows box. in the meantime, congratulations for programming a tool! :+) -bowerbird