[SPAM] re: New Tool "pgdiff"

18 Mar 2010

      jim said:
...
And the correct answer is neither A nor B but rather C == _must_
well, one could easily program the tool to offer an italicized version
of an all-upper choice if you know you'll be processing p.g. e-texts.

indeed, a button that will italicize both choices is easy enough to code.

likewise with _any_ particular editing function that might be required...

for instance, i've programmed a routine that checks for a spacey-quote;
if it finds a spacey-quote in one of the choices, and the two choices are
otherwise identical, it auto-selects the option without the spacey-quote.

***

in reviewing the pgdiff output from the sitka files, i wanted to see if you
would do much preprocessing on the files.   it appears to me you did not.

in general, i'd _highly_ recommend preprocessing before a comparison.

the number of diffs can be significantly lowered by good preprocessing,
and preprocessing is typically a far more efficient way to make changes.

it also helps to know about the nature of the files that you're comparing.

for instance, one of the sitka files was a post-proofing file, meaning that
it was littered with artifacts of the d.p. workflow...   these include 
"notes"
the proofers leave for the post-processor.   it's far better to handle 
these
"notes" in an editor during preprocessing before you start the comparison.

another artifact of the d.p. workflow is asterisks on end-line (and 
end-page)
hyphenates.   i typically just delete these asterisks, as i have no use for 
them.
some of these were present in the o.c.r. file too, so i removed them as 
well...

after having deleted all the asterisks associated with "notes" and 
hyphenates,
the only asterisks left in the file were those that indicated _footnotes_ 
in the
o.c.r. file, so i did a monitored global change of them to footnote 
indicators.

that way, these footnote indicators wouldn't present a "spurious" 
difference...

(i could've done a global change to the characters that indicated the 
second
and third footnotes on a page, but i didn't bother, as there weren't too 
many.)

it also helps to know that rfrank marks "questionable" situations with an 
"@",
so you can search for those and deal with those before doing a comparison.

oh, and one other _big_ thing.   the o.c.r. file had the _pagenumbers_ in 
it.
they were enclosed in brackets, at the bottom of most pages, which is why
rfrank's preprocessing program probably didn't find them to delete them...

now, those pagenumbers were deleted by the proofers -- except in the 2
cases where the proofers failed to make the deletion -- so they were _not_
present in the second file.   so, to avoid the spurious diffs, you could 
have
eliminated them from the o.c.r. file easily, with a series of reg-ex 
changes.

on the other hand, since i _like_ pagenumbers, and want to _keep_ them,
i had my tool _inject_ them from the o.c.r. file back into the proofed 
file...

either way, it's best to eliminate as many of these "spurious" diffs as you 
can.

and i note here, jim, that you did eliminate one case of such "spurious" 
diffs
when you reformatted the page-scan references so they would be identical.
so i encourage you to take that general idea and run with it...

i _will_ talk further about the diffs that were generated anyway.

but i wanted to stress the importance of doing preprocessing...

***

jim, i looked at hkdiff.txt briefly.

i don't know what kind of sense to make of this diff at the end:
...
{|or|don't|you?-that's|the|idea.|Don't|you|reckon|12}
i removed the whitespace so it'd fit on one line.

-bowerbird

Bowerbird＠aol.com

tags

participants (1)