
On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
I am told by the whitewashers that it is *essential* that all text for PG pass guiguts. Because this assumes that the language scanned is American it gives 90% plus false positive errors, on my books, which is totally unsatisfactory for any piece of test software.
Is there a language free version of Guiguts?
I'm not quite sure which question you're asking, and about which checking tool, but I think there is some confusion somewhere, of emphasis if not of fact, and I'm continually surprised by people who don't know the origins of really quite recent procedures I remember vividly, and I've had several threads recently about this general subject of checking, so please bear with me while I regurgitate history. I hope you'll find a satisfactory answer in here somewhere. Anybody can use any programs they like to make texts, and different people do use different tools, according to their own needs or the needs of the individual texts. Considering that we get French and German and Esperanto and Chinese texts, not to mention older English, there is no one-size-fits-all solution for language. Once, there were no checking tools at all, except for spellcheckers built into Word Perfect and Word, which is what most people used, and I could tell you some stories about having to convert those! David Price and Martin Ward and I made checkers that we used for ourselves. There may have been others, but those are the ones I'm aware of. Everything else was Mark One Eyeball. I had done a lot of cleaning-up work on a lot of texts for various people, and I would then send those on to Michael for posting. They would commonly take hours of work each. In self-defense, I wrote a checker I (later) renamed to gutcheck. When the WWs were formed in 2001, I brought gutcheck with me, and we all used it to find errors quickly in incoming texts. It was still standard, at that time, for gutcheck to find anything up to 50 or 100 errors in a typical incoming text. Checking and fixing could still take hours, and often involve long threads with the submitter. Up till then, there was really no difference between DP and Other texts, though because the people who mostly submitted from DP were experienced, and because DP favored simple texts (! yes, it's true), they were easier than the usual. When DP hit Slashdot, in late 2002, I was still posting the majority of texts, and both the quantity and quality of texts coming from DP went nuts. And so did I. To put it mildly, I got mediaeval on peoples' asses about the quality of incoming texts. I still wince when I remember some of the things I said then. But the point is that the few WWs couldn't possibly handle the amount of work now being spewed at us. What happened next was a kind of arms race between submitters and WWs. Submitters didn't want to have their texts bounced, or go through a long re-checking thread, so they adopted the checking tools we used to ensure that we wouldn't easily find errors. (Which, in a way was kind of a bad thing. It used to be that I knew that gutcheck would find about _half_ of the errors in an incoming text, but if the submitter had used gutcheck, I would find none, but would have no idea how many more I had to look for. I used to have lots of fun when I found a new check to add but hadn't released the new version yet. Heh. Anyway...) The most significant feature of DP, I often think, is that because of the need for multiple people to work on the same text, new information and methods propagate and are assimilated much faster there than elsewhere. In March 2003, Charlz set up the PPV system to meet the new pressures. New producers/PPs would have their file checked by more experienced people, who have come to do, at least for DP, most of the work that the WWs did pre-Slashdot. I burned out, and had to go away on an extended business trip anyhow. David Widger started actively WWing other peoples' submissions, and between the new PPV system and David, things became stable again, but at a higher volume than before. A couple of months later, Steve Schulze (thundergnat) responded to the need for people who couldn't easily work with command-line tools to use gutcheck, and wrote GuiGuts, which uses gutcheck to create a list of things to check, and does a whole lot of other things as well, in a GUI. It has become the standard "Swiss Army Knife" for preparing texts in DP. I will be forever grateful to him for saving me from having to write a cross-platform GUI for gutcheck! :-) And GuiGuts and gutcheck have accreted features ever since. If you have GuiGuts, then you have gutcheck, since Steve bundles it with GuiGuts -- and you also have a large number of other tools that may or may not be useful for the particular text you're working on. There are many other checkers available as well, and I'd love to ramble on about them, but this is too long already, and it doesn't bear on your question. This is how it comes -- by evolution, not by fiat -- that incoming texts are checked with _several_ tools, according to what seems appropriate for the text, but most commonly with gutcheck and/or GuiGuts. Of course, we don't catch all the errors, but we mostly don't have to spend hours on each one anymore either. With texts from DP, we know that usually two people have gone through more-or-less the same list of checks that we do, so mostly we don't find much that needs querying. But still we give each one a once-over. Now, _which_ tools are going to get used by a WW will depend on the person and the text. "Text-checking" (scannos, letter-combinations, etc.) in gutcheck is pretty useless outside "normal" modern English prose, because of the false positives. You can switch it off by using the -t switch from the command line. Or, running through GuiGuts, in Fixup/Gutcheck options, just tick the -t option to disable. But there are also other checks like scannos and regexes in GuiGuts that may give a lot of false positives when run against a text heavy in dialect. So when you say "pass GuiGuts", I don't know exactly what you mean. The things that GuiGuts and gutcheck (and the various other checkers) note are _queries_, not pass/fail items. If the author wrote "beear", then that's what he wrote. Some functions (but I couldn't offhand give you a list of which) in GuiGuts may query it, and so might gutcheck, or GutAxe, or gutspell, or check-punct, or whatever. In fact, I'm surprised you got a comment about it at all, unless there were real errors in the text that could have been caught by the commonest of checks used today. Getting into discussion threads with submitters is a HUGE burner of time that, for the most part, the WWs don't have, so we don't start one except when we must. It's still a bit of an arms race between the producers and the checkers, whether those are WWs or PPVs. It doesn't matter whether you use one tool or another, so long as the result is at least good enough that whoever checks your file won't find any problems. I had a thread with a submitter recently in which I bounced a text, saying that I had spent 18 minutes to find the first error, and the submitter asked what I do and I said something like "Well, I run the standard checks, and I look at those and call up any extra checks I think might apply and I actually _read_ paragraphs from the text for about half an hour, and if I can't find any problems in that time, I consider it goes clean," and he said "OK, then next time, I just have to hold you off for 12 more minutes! :-)" The thing about this particular arms race is that it is beneficial. Because the producers are always trying to get it past the checkers clean, and the checkers are always trying to catch something wrong in the incoming texts, the overall quality level goes relentlessly up. If every checker could spend hours and hours on every text, it would go up more, but as many people on this list know, checking is hard and tiresome work, and people who are willing and experienced and good at it are always in demand, and there are always more texts coming in -- which is a GOOD thing! -- so we have to accept that there is only so much we can do in any given case. jim (Now tell me that all you wanted was the -t switch. :-)