
Interesting--gutcheck_u reported several mismatched double quotes, but all were explainable, e.g. poem fragments inside a quoted paragraph, or two quoted paragraphs with an illustration between. On the other hand, there were many mismatched left/right single quotes--evidently DP's ASCII single quote to Unicode left/right single quotes needs some work. I'll do some more testing of gutcheck_u while WWing submissions. It would be very handy if someone could upgrade Gutspell, which doesn't work properly on texts containing Unicode quotes (single and double), em-dashes, etc. Al -----Original Message----- From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of James Adcock Sent: Thursday, February 26, 2015 10:24 AM To: 'Project Gutenberg Volunteer Discussion' Subject: Re: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck Off the top of my head, try 48325, which demonstrates the handedness issue. From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Al Haines Sent: Wednesday, February 25, 2015 9:37 PM To: 'Project Gutenberg Volunteer Discussion'; 'James Adcock' Subject: Re: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck I'd like to give it a try. Can you upload it somewhere? Or you can send me a copy as a zipped attachment. And if you can mention the etext numbers of several of the files you tested it against, I can cross-check them with Gutcheck and Bookloupe (http://www.juiblex.co.uk/pgdp/bookloupe/index.html). Al -----Original Message----- From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of James Adcock Sent: Wednesday, February 25, 2015 7:49 PM To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck I've created a Unicode UTF-8 Compatible Version of Gutcheck, calling it gutcheck_u -- if anyone wants to try it. This was primarily an exercise in finding and changing 8-bit char coding dependencies to 16-bit widechar coding dependencies, but it did require some additional coding. Currently it is a somewhat Windows-dependent implementation, so if you want to run it on another OS it would take a little bit of work. I find it more pleasant to use if one's development file format of choice is UTF-8, and/or if one are doing such things as left-handed / right-handed quotes. Testing it on PG released files I am in fact finding a fair amount of left / right handedness errors that are not currently being discovered. Let me know if you want to try it, same distribution terms as the original. Jim Adcock jimad@msn.com