
here's more info on my collaborative proofreading site... *** to see what we're talking about, you can visit this u.r.l.:
*** we've talked about 4 main topics:
navigating the pages... certifying a page as clean... searching the book for a string... feel the power with the "command" field...
under the 4th topic -- the command field -- we've discussed these commands you can issue:
showmap... concat... showcustom... blubberbaby... pairsearch... end-page-hyphenates...
today we'll discuss a few more commands...
copyfootnotes... movefootnotes... show-end-line-hyphenates...
*** copyfootnotes... some e-book formats want the footnotes collected together into their own section (a la "endnotes")... to accomplish this, enter "copyfootnotes" into the search-field, and then click the "find" button... all of the footnotes will be presented on a screen, so you can copy them en masse. this command leaves the footnotes unmolested on their pages... *** movefootnotes... movefootnotes is another command that does the same as "copyfootnotes", except "movefootnotes" also deletes each footnote from its original page... i'll note that neither of these commands should be used until proofing has been completely finished. until that time, you want to leave the footnote in the one place where it can be most easily proofed, which is right there on that page, next to the scan. for the moment, i have disabled this command... once i've programmed the "mass revert" ability, to reverse any sabotage effort, i will reinstate it. *** show-end-line-hyphenates... you'll probably recall that i encourage people to _retain_ original linebreaks from the paper-book, expressly including all the end-line-hyphenates... this makes it much easier to do proofing, as even distributed proofreaders and project gutenberg acknowledge when it comes to _them_ proofing. (so why they rewrap their text before giving it to other people is a bit disingenuous; but i digress.) at any rate, one slight problem with this approach is that the hyphenated fragments often do _not_ pass spellcheck, and thus are unnecessarily flagged. for instance, you might have the first part of a frag- ment on the top line, and the second on the bottom, and neither "frag-" nor "ment" will pass spellcheck. this command helps you solve that little problem. "show-end-line-hyphenates" will list all of them, as you might expect, but it does a little bit more. first, it tests if the rejoined form passes spellcheck. if so, then it gives you both fragments, so that you can include them in the book's custom dictionary... this command also surveys the full book to see how many times the rejoined form appears in it -- with hyphen, without it, and as two words -- and informs you of the counts, which is good info. i restored all of the end-line-hyphenates on many pages within the "sitka" book, and you can observe the output from "show-end-line-hyphenates" here:
*** while we're on the subject of end-line-hyphenates, i should briefly address one of the thorny matters... i've always maintained that users should be able to unwrap the text themselves, any time they wanted. indeed, i've said we should give them tools to do it. even more than _that,_ i've _provided_ such a tool:
in most cases, an end-line-hyphenate is _easy_ to resolve. you eliminate the dash and then bring up the first string from the next line and concatenate it. simple enough. the glitch happens when it was a _compound_word_ -- i.e., a word that includes a dash in it _normally._ in word-processing parlance, this is known as the difference between a "hard" and a "soft" hyphen... so, in order to indicate to the unwrap routine that any particular dash at the end of a line is a "hard" hyphen, to be retained, we need to give it some kind of marker. i've decided -- tentative to testing for problems -- this marker will be the "~" character, after the dash. you can see cases in the sitka book where this happens:
http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap007 http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap019 http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap093 http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094 http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094 http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap107
the lines from those 6 cases are listed here, respectively:
sions in America. The sails of ships from far-~ off Kronstadt on the Baltic brought Russian
during the winter the hunters took 40 sea-~ lions, and in the spring many seals were
of ancient Venice. The picturesque, dark-~ skinned Thlingit women sit at the doors of
Russian fur warehouse. Next is the three-~ story building used for courthouse and jail,
and later of the U.S. Marines from the Man-~ of-War which was stationed here. East of
sea. Eastward crest after crest of glacier-~ capped peaks rise for a hundred miles,
so when these are unwrapped, the words "far-off" and "sea-lion" and "dark-skinned" and "three-story" and "man-of-war" and "glacier-capped" will now be rendered as they should be -- as compound words... *** based on my long observation, i'd say dehyphenation is one of the most _inelegant_ aspects of the d.p. system... first of all, it causes unnecessary work for the proofers, because it's more difficult to proof when the linebreaks have been disturbed in any way. even though the effect is relatively small when it's just on end-line-hyphenates, it still cumulates. (and the dictum against "unclothed" em-dashes at line-ends adds to this cumulative effect.) this shifting of original linebreaks causes line-lengths to become uneven, introducing a variety of problems in that some routines that _could_ be written to help process the text depend on line-lengths, and thus are sabotaged when we change the line-lengths arbitrarily. second, dehyphenation itself is work, because proofers (who do not have access to any book-wide information) have to make a judgment about whether the hyphen is to be retained or not, which is fraught with ambiguity... this leads to diffs, which chew even more proofer time. indeed, in the "perpetual" projects, we saw cases where one proofer would take out a hyphen, and another one would put it back with an asterisk (meaning "check it"). and then the third proofer would take out the asterisk! and of course, if a proofer makes a bad decision, that pollutes the text, which can lead to more bad decisions. decisions on all end-line-hyphenates should be made during preprocessing. then if the proofers challenge any of the decisions, the postprocessor can decide that. that's the only sensible workflow. and this "show-end-line-hyphenates" command shows that it is indeed possible to handle end-line-hyphenates in a manner that is simple, yet adequately sophisticated. *** so those are our 3 new commands for the weekend... more later... -bowerbird