
jim said:
And the correct answer is neither A nor B but rather C == _must_
well, one could easily program the tool to offer an italicized version of an all-upper choice if you know you'll be processing p.g. e-texts. indeed, a button that will italicize both choices is easy enough to code. likewise with _any_ particular editing function that might be required... for instance, i've programmed a routine that checks for a spacey-quote; if it finds a spacey-quote in one of the choices, and the two choices are otherwise identical, it auto-selects the option without the spacey-quote. *** in reviewing the pgdiff output from the sitka files, i wanted to see if you would do much preprocessing on the files. it appears to me you did not. in general, i'd _highly_ recommend preprocessing before a comparison. the number of diffs can be significantly lowered by good preprocessing, and preprocessing is typically a far more efficient way to make changes. it also helps to know about the nature of the files that you're comparing. for instance, one of the sitka files was a post-proofing file, meaning that it was littered with artifacts of the d.p. workflow... these include "notes" the proofers leave for the post-processor. it's far better to handle these "notes" in an editor during preprocessing before you start the comparison. another artifact of the d.p. workflow is asterisks on end-line (and end-page) hyphenates. i typically just delete these asterisks, as i have no use for them. some of these were present in the o.c.r. file too, so i removed them as well... after having deleted all the asterisks associated with "notes" and hyphenates, the only asterisks left in the file were those that indicated _footnotes_ in the o.c.r. file, so i did a monitored global change of them to footnote indicators. that way, these footnote indicators wouldn't present a "spurious" difference... (i could've done a global change to the characters that indicated the second and third footnotes on a page, but i didn't bother, as there weren't too many.) it also helps to know that rfrank marks "questionable" situations with an "@", so you can search for those and deal with those before doing a comparison. oh, and one other _big_ thing. the o.c.r. file had the _pagenumbers_ in it. they were enclosed in brackets, at the bottom of most pages, which is why rfrank's preprocessing program probably didn't find them to delete them... now, those pagenumbers were deleted by the proofers -- except in the 2 cases where the proofers failed to make the deletion -- so they were _not_ present in the second file. so, to avoid the spurious diffs, you could have eliminated them from the o.c.r. file easily, with a series of reg-ex changes. on the other hand, since i _like_ pagenumbers, and want to _keep_ them, i had my tool _inject_ them from the o.c.r. file back into the proofed file... either way, it's best to eliminate as many of these "spurious" diffs as you can. and i note here, jim, that you did eliminate one case of such "spurious" diffs when you reformatted the page-scan references so they would be identical. so i encourage you to take that general idea and run with it... i _will_ talk further about the diffs that were generated anyway. but i wanted to stress the importance of doing preprocessing... *** jim, i looked at hkdiff.txt briefly. i don't know what kind of sense to make of this diff at the end:
{|or|don't|you?-that's|the|idea.|Don't|you|reckon|12}
i removed the whitespace so it'd fit on one line. -bowerbird
participants (1)
-
Bowerbird@aol.com