Interesting--gutcheck_u reported several mismatched double quotes, but all were explainable, e.g. poem fragments inside a quoted paragraph, or two quoted paragraphs with an illustration between. 
 
On the other hand, there were many mismatched left/right single quotes--evidently DP's ASCII single quote to Unicode left/right single quotes needs some work.
 
I'll do some more testing of gutcheck_u while WWing submissions.
 
 
It would be very handy if someone could upgrade Gutspell, which doesn't work properly on texts containing Unicode quotes (single and double), em-dashes, etc.
 
 
Al
 
 
-----Original Message-----
From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of James Adcock
Sent: Thursday, February 26, 2015 10:24 AM
To: 'Project Gutenberg Volunteer Discussion'
Subject: Re: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck

Off the top of my head, try 48325, which demonstrates the handedness issue.

 

From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Al Haines
Sent: Wednesday, February 25, 2015 9:37 PM
To: 'Project Gutenberg Volunteer Discussion'; 'James Adcock'
Subject: Re: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck

 

I'd like to give it a try.  Can you upload it somewhere?  Or you can send me a copy as a zipped attachment. 

 

And if you can mention the etext numbers of several of the files you tested it against, I can cross-check them with Gutcheck and Bookloupe (http://www.juiblex.co.uk/pgdp/bookloupe/index.html).

 

 

Al

 

 

 

-----Original Message-----
From: gutvol-d [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of James Adcock
Sent: Wednesday, February 25, 2015 7:49 PM
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] Unicode UTF-8 Compatible Version of Gutcheck

I’ve created a Unicode UTF-8 Compatible Version of Gutcheck, calling it gutcheck_u -- if anyone wants to try it.

 

This was primarily an exercise in finding and changing 8-bit char coding dependencies to 16-bit widechar coding dependencies, but it did require some additional coding.

 

Currently it is a somewhat Windows-dependent implementation, so if you want to run it on another OS it would take a little bit of work.

 

I find it more pleasant to use if one’s development file format of choice is UTF-8, and/or if one are doing such things as left-handed / right-handed quotes.

 

Testing it on PG released files I am in fact finding a fair amount of left / right handedness errors that are not currently being discovered.

 

Let me know if you want to try it, same distribution terms as the original.

 

Jim Adcock

jimad@msn.com