here's more info on my collaborative proofreading site...

***

to see what we're talking about, you can visit this u.r.l.:
>   http://z-m-l.com/go/sitka/editr.pl

***

we've talked about 4 main topics:
>   navigating the pages...
>   certifying a page as clean...
>   searching the book for a string...
>   feel the power with the "command" field...

under the 4th topic -- the command field --
we've discussed these commands you can issue:
>   showmap...
>   concat...
>   showcustom...
>   blubberbaby...
>   pairsearch...
>   end-page-hyphenates...

today we'll discuss a few more commands...

>   copyfootnotes...
>   movefootnotes...
>   show-end-line-hyphenates...

***


copyfootnotes...

some e-book formats want the footnotes collected
together into their own section (a la "endnotes")...

to accomplish this, enter "copyfootnotes" into the
search-field, and then click the "find" button...

all of the footnotes will be presented on a screen,
so you can copy them en masse.  this command
leaves the footnotes unmolested on their pages...

***


movefootnotes...

movefootnotes is another command that does the
same as "copyfootnotes", except "movefootnotes"
also deletes each footnote from its original page...

i'll note that neither of these commands should be
used until proofing has been completely finished.
until that time, you want to leave the footnote in
the one place where it can be most easily proofed,
which is right there on that page, next to the scan.

for the moment, i have disabled this command...
once i've programmed the "mass revert" ability,
to reverse any sabotage effort, i will reinstate it.

***


show-end-line-hyphenates...

you'll probably recall that i encourage people to
_retain_ original linebreaks from the paper-book,
expressly including all the end-line-hyphenates...

this makes it much easier to do proofing, as even
distributed proofreaders and project gutenberg
acknowledge when it comes to _them_ proofing.

(so why they rewrap their text before giving it to
other people is a bit disingenuous; but i digress.)

at any rate, one slight problem with this approach
is that the hyphenated fragments often do _not_
pass spellcheck, and thus are unnecessarily flagged.

for instance, you might have the first part of a frag-
ment on the top line, and the second on the bottom,
and neither "frag-" nor "ment" will pass spellcheck.

this command helps you solve that little problem.

"show-end-line-hyphenates" will list all of them,
as you might expect, but it does a little bit more.

first, it tests if the rejoined form passes spellcheck.

if so, then it gives you both fragments, so that you
can include them in the book's custom dictionary...

this command also surveys the full book to see
how many times the rejoined form appears in it
-- with hyphen, without it, and as two words --
and informs you of the counts, which is good info.

i restored all of the end-line-hyphenates on many
pages within the "sitka" book, and you can observe
the output from "show-end-line-hyphenates" here:
>   http://z-m-l.com/go/sitka/hyphenates-output.html

***

while we're on the subject of end-line-hyphenates,
i should briefly address one of the thorny matters...

i've always maintained that users should be able to
unwrap the text themselves, any time they wanted.
indeed, i've said we should give them tools to do it.

even more than _that,_ i've _provided_ such a tool:
>   http://z-m-l.com/go/unwrap.pl

in most cases, an end-line-hyphenate is _easy_ to
resolve.  you eliminate the dash and then bring up
the first string from the next line and concatenate it.

simple enough.

the glitch happens when it was a _compound_word_
-- i.e., a word that includes a dash in it _normally._

in word-processing parlance, this is known as the
difference between a "hard" and a "soft" hyphen...

so, in order to indicate to the unwrap routine that any
particular dash at the end of a line is a "hard" hyphen,
to be retained, we need to give it some kind of marker.

i've decided -- tentative to testing for problems --
this marker will be the "~" character, after the dash.

you can see cases in the sitka book where this happens:
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap007
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap019
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap093
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094
>   http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap107

the lines from those 6 cases are listed here, respectively:

>   sions in America. The sails of ships from far-~
>   off Kronstadt on the Baltic brought Russian

>   during the winter the hunters took 40 sea-~
>   lions, and in the spring many seals were

>   of ancient Venice. The picturesque, dark-~
>   skinned Thlingit women sit at the doors of

>   Russian fur warehouse. Next is the three-~
>   story building used for courthouse and jail,

>   and later of the U.S. Marines from the Man-~
>   of-War which was stationed here. East of

>   sea. Eastward crest after crest of glacier-~
>   capped peaks rise for a hundred miles,

so when these are unwrapped, the words "far-off"
and "sea-lion" and "dark-skinned" and "three-story"
and "man-of-war" and "glacier-capped" will now be
rendered as they should be -- as compound words...

***

based on my long observation, i'd say dehyphenation is
one of the most _inelegant_ aspects of the d.p. system...

first of all, it causes unnecessary work for the proofers,
because it's more difficult to proof when the linebreaks
have been disturbed in any way.  even though the effect
is relatively small when it's just on end-line-hyphenates,
it still cumulates.  (and the dictum against "unclothed"
em-dashes at line-ends adds to this cumulative effect.)

this shifting of original linebreaks causes line-lengths
to become uneven, introducing a variety of problems
in that some routines that _could_ be written to help
process the text depend on line-lengths, and thus are
sabotaged when we change the line-lengths arbitrarily.

second, dehyphenation itself is work, because proofers
(who do not have access to any book-wide information)
have to make a judgment about whether the hyphen is
to be retained or not, which is fraught with ambiguity...
this leads to diffs, which chew even more proofer time.
indeed, in the "perpetual" projects, we saw cases where
one proofer would take out a hyphen, and another one
would put it back with an asterisk (meaning "check it").
and then the third proofer would take out the asterisk!
and of course, if a proofer makes a bad decision, that
pollutes the text, which can lead to more bad decisions.

decisions on all end-line-hyphenates should be made
during preprocessing.  then if the proofers challenge
any of the decisions, the postprocessor can decide that.
that's the only sensible workflow.

and this "show-end-line-hyphenates" command shows
that it is indeed possible to handle end-line-hyphenates
in a manner that is simple, yet adequately sophisticated.

***

so those are our 3 new commands for the weekend...

more later...

-bowerbird