polishing polishing polishing

still doing some last-minute stuff on huck finn... once again, as usual, i am struck by the required amount of attention-to-detail involved in polishing text for an e-book. even for an obsessive type like myself, it can overwhelm... (although, it should be said, most books are _not_ as hard as this one, which presents a variety of atypical challenges.) sometimes even something you think is going to be _simple_ ends up using more time and energy that you want it to take. just as an example, i did very little work on italics as i was cleaning up this text for "adventures of huckleberry finn", since i planned on splicing in the italics from jim's #32325. except now, in my confirmation work, i found at least _ten_ instances where jim might well have missed an italics word. (can't say for sure, because i don't know exactly what edition he used, but at least according to my edition, he missed 'em.) that's not a criticism of jim, or the quality of his work, since -- for all i know -- my file might be missing a dozen more... the whole reason i decided to import jim's italics is because i hate italics-checking, and it is hard, and time-consuming, and i miss lots of 'em, so the entire exercise is frustrating... of course, _part_ of the problem here is that i am trying to move this book as close to perfection as i can, when really, i should just say "ok, let's bring in the smoothreaders now." that "smoothreader" stage is an important one, because that's where the cost-benefit ratio achieves its primacy... smoothreaders will _not_ necessarily catch _every_ error. but, in a nutshell, if smoothreaders don't catch an error, then it's probably the case that the error is unimportant. the object, of course, is to move every e-book to perfection. but that can _cost_ more than the _benefit_ would be worth. if an error is so slight that it's not even _noticed_, then -- the odds are -- it is unimportant enough to matter much, so the act of detecting and fixing it simply _isn't_worth_it._ there are lots of cases, for instance, where it's hard to tell if something is a semicolon or a colon. but does it matter? probably not. especially since the typesetter didn't seem to be following any kind of consistent rule deciding between 'em, even in the cases where there is no uncertainty which one it is. sometimes it's hard to know, then, when to let something go... that's why smoothreaders can be so useful... -bowerbird

except now, in my confirmation work, i found at least _ten_ instances where jim might well have missed an italics word. (can't say for sure, because i don't know exactly what edition he used, but at least according to my edition, he missed 'em.)
This is quite possibly a true statement (but BB ought to actually compare to the editions that I used which is different than his edition). I also hate italics work, and I do not know a way to help automate the checking - other than making sure that one records any italic determination made by the original OCR software - which also tend to be very bad at catching italics. Multiple historical printed editions also tend to be very bad at italics, with later editions tending to lose more and more of the italics found in the original.

In my own proofing the only way I can have any confidence at all that I'm catching all the italics is to scan through the entire text looking for nothing but. As soon as I start noticing (and marking) anything else, I start missing them. On Thu, Oct 25, 2012 at 7:52 PM, James Adcock <jimad@msn.com> wrote:
except now, in my confirmation work, i found at least _ten_ instances where jim might well have missed an italics word. (can't say for sure, because i don't know exactly what edition he used, but at least according to my edition, he missed 'em.)
This is quite possibly a true statement (but BB ought to actually compare to the editions that I used which is different than his edition). I also hate italics work, and I do not know a way to help automate the checking – other than making sure that one records any italic determination made by the original OCR software – which also tend to be very bad at catching italics. Multiple historical printed editions also tend to be very bad at italics, with later editions tending to lose more and more of the italics found in the original.****
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

"don" == don kretz <dakretz@gmail.com> writes:
don> --===============2035048134== Content-Type: don> multipart/alternative; boundary=14dae93404f5dfab1a04ccf0024a don> --14dae93404f5dfab1a04ccf0024a Content-Type: text/plain; don> charset=windows-1252 Content-Transfer-Encoding: don> quoted-printable don> In my own proofing the only way I can have any confidence at don> all that I'm catching all the italics is to scan through the don> entire text looking for nothing but. As soon as I start don> noticing (and marking) anything else, I start missing them. don> On Thu, Oct 25, 2012 at 7:52 PM, James Adcock <jimad@msn.com> don> wrote: >> >except now, in my confirmation work, i found at least _ten_ >> instances where jim might well have missed an italics word. >> (can't say for sure, because i don't know exactly what edition >> he used, but at least according to my edition, he missed 'em.) >> >> This is quite possibly a true statement (but BB ought to >> actually compare to the editions that I used which is different >> than his edition). I also hate italics work, and I do not know >> a way to help automate the checking = don> =96 >> other than making sure that one records any italic >> determination made by the original OCR software =96 which also >> tend to be very bad at catching italics. Multiple historical >> printed editions also tend to be very bad at italics, with >> later editions tending to lose more and more of the italics >> found in the original.**** >> >> _______________________________________________ gutvol-d >> mailing list gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/mailman/listinfo/gutvol-d >> >> don> --14dae93404f5dfab1a04ccf0024a Content-Type: text/html; don> charset=windows-1252 Content-Transfer-Encoding: don> quoted-printable don> In my own proofing the only way I can have any confidence at don> all that<br>I&= #39;m catching all the italics is to scan don> through the entire text looking f= or<br>nothing but. As soon don> as I start noticing (and marking) anything else,= <br> don> I start missing them.<br><br><br><div don> class=3D"gmail_quote">On Thu, Oct 25,= 2012 at 7:52 PM, James don> Adcock <span dir=3D"ltr"><<a href=3D"mailto:jima= don> d@msn.com" target=3D"_blank">jimad@msn.com</a>></span> don> wrote:<br><blockq= uote class=3D"gmail_quote" don> style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = don> solid;padding-left:1ex"> don> <div link=3D"blue" vlink=3D"purple" lang=3D"EN-US"><div><p don> class=3D"MsoNorm= al"><span don> style=3D"font-size:13.5pt;font-family:"Lucida don> Grande",&= don> quot;serif";color:#1f497d">></span><span don> style=3D"font-size:13.5pt;= font-family:"Lucida don> Grande","serif"">except now, in my = don> confirmation work, i found at least _ten_<br> don> instances where jim might well have missed an italics don> word.<br>(can't s= ay for sure, because i don't know don> exactly what edition<br>he used, but = at least according to don> my edition, he missed 'em.)<br><br></span><span s= don> tyle=3D"font-size:13.5pt;font-family:"Lucida don> Grande","serif&= quot;;color:#1f497d">This is quite don> possibly a true statement (but BB ought = to actually compare don> to the editions that I used which is different than his= don> edition).=A0 I also hate italics work, and I do not know a don> way to help aut= omate the checking =96 other than making don> sure that one records any italic d= etermination made by the don> original OCR software =96 which also tend to be ve= ry bad at don> catching italics. Multiple historical printed editions also don> tend = to be very bad at italics, with later editions tending don> to lose more and mor= e of the italics found in the don> original.</span><span style=3D"font-family:&q= don> uot;Arial","sans-serif""><u></u><u></u></span></p> don> </div></div><br>_______________________________________________<br> don> gutvol-d mailing list<br> <a don> href=3D"mailto:gutvol-d@lists.pglaf.org">gutvol-d@lists.pglaf.org</a><br= >> don> <a href=3D"http://lists.pglaf.org/mailman/listinfo/gutvol-d" don> target=3D"_bla= don> nk">http://lists.pglaf.org/mailman/listinfo/gutvol-d</a><br> don> <br></blockquote></div><br> don> --14dae93404f5dfab1a04ccf0024a-- don> --===============2035048134== Content-Type: text/plain; don> charset="us-ascii" MIME-Version: 1.0 don> Content-Transfer-Encoding: 7bit Content-Disposition: inline don> _______________________________________________ gutvol-d don> mailing list gutvol-d@lists.pglaf.org don> http://lists.pglaf.org/mailman/listinfo/gutvol-d don> --===============2035048134==--
"don" == don kretz <dakretz@gmail.com> writes:
don> --===============2033116300== Content-Type: don> multipart/alternative; boundary=14dae9341117e2408b04ccf01362 don> --14dae9341117e2408b04ccf01362 Content-Type: text/plain; don> charset=ISO-8859-1 don> It also destroys my accuracy when the text already has don> italics markup in it and I must check for false positives. don> --14dae9341117e2408b04ccf01362 Content-Type: text/html; don> charset=ISO-8859-1 don> It also destroys my accuracy when the text already has don> italics markup in it and I must check for false don> positives.<br> don> --14dae9341117e2408b04ccf01362-- don> --===============2033116300== Content-Type: text/plain; don> charset="us-ascii" MIME-Version: 1.0 don> Content-Transfer-Encoding: 7bit Content-Disposition: inline don> _______________________________________________ gutvol-d don> mailing list gutvol-d@lists.pglaf.org don> http://lists.pglaf.org/mailman/listinfo/gutvol-d don> --===============2033116300==--

"don" == don kretz <dakretz@gmail.com> writes:
don> In my own proofing the only way I can have any confidence at don> all that I'm catching all the italics is to scan through the don> entire text looking for nothing but. As soon as I start don> noticing (and marking) anything else, I start missing them. don> It also destroys my accuracy when the text already has don> italics markup in it and I must check for false positives. I agree. Especially with heavily marked-up texts, check one feature at a time. When I have already marked-up text to check the way that I prefer is to scan the images and stop whenever I find an italics, and check in the marked-up text the next occurrence of <i>. Using `grep "<i>"` instead of the text is even better. Carlo (sorry of the previous message, I was composing this one and something strange happened)

In my own proofing the only way I can have any confidence at all that I'm catching all the italics is to scan through the entire text looking for nothing but. As soon as I start noticing (and marking) anything else, I start missing them.
Yes, but even so italicized short words are still easy to miss.

It also destroys my accuracy when the text already has italics markup in it and I must check for false positives.
participants (4)
-
Bowerbird@aol.com
-
don kretz
-
James Adcock
-
traverso@posso.dm.unipi.it