
On Wed, 18 Jan 2006 20:19:33 -0500, Jim Tinsley <jtinsley@pobox.com> wrote: |On Wed, 18 Jan 2006 11:44:46 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote: | |>On Mon, 16 Jan 2006 15:54:37 -0500, Jim Tinsley <jtinsley@pobox.com> |>wrote: |> |>|On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote: |>| |>| |>|>I am told by the whitewashers that it is *essential* that all text for PG |>|>pass guiguts. Because this assumes that the language scanned is American |>|>it gives 90% plus false positive errors, on my books, which is totally |>|>unsatisfactory for any piece of test software. |>|> |>|>Is there a language free version of Guiguts? |>| |>|I'm not quite sure which question you're asking, and about which |>|checking tool, but I think there is some confusion somewhere, of |>|emphasis if not of fact, and I'm continually surprised by people who |>|don't know the origins of really quite recent procedures I remember |>|vividly, and I've had several threads recently about this general |>|subject of checking, so please bear with me while I regurgitate |>|history. I hope you'll find a satisfactory answer in here somewhere. |> |>I could only find one tool which shows on my Win XP computer that is |>guiguts. This as far as I can ascertain has various subroutines which |>are very badly tied together, and in no way at all follow the Windoze |>interface. |> |>|Anybody can use any programs they like to make texts, and different |>|people do use different tools, according to their own needs or the |>|needs of the individual texts. Considering that we get French and |>|German and Esperanto and Chinese texts, not to mention older English, |>|there is no one-size-fits-all solution for language. |> |>To get things past whitewashers one apparently must use this, or things get |>rejected. Your assertion is therefore clearly theoretically correct, but |>in reality absolutely wrong | |Now, this I can flatly deny. There is no such thing as a standard without a test to show that the test has been passed. Schools teach to the exam, which some think wrong, but it happened to me and judging by the media Brouhaha still happens in the UK. I was an Engineer, and there a draughtsman who failed to put a test (tolerance) on anything in a drawing, was exposed to public ridicule. If such a drawing got onto the shop floor the production departments would deliberately fail to follow any reasonable tolerance. | I can think of half-a-dozen people |offhand, regular producers, who don't use gutcheck in any form. |They don't need to. Their quality standards are high enough that |it won't find any real errors. I do run it on their texts, as a |matter of form, but I know in advance what the result will be. |For all I know, there are others, equally good, but I don't know |that they don't use gutcheck because the subject never comes up. |Most of us, of course, are not that good. You have just admitted that gutcheck is the standard on PG. | |Bill Flis, who wrote the GutWrench package, uses his own checkers |exclusively, and I know equally well that I won't find any errors |that can sanely be caught by automation in his texts either. You can |find them, if you're interested, at http://www.pgdp.net/tools/GW.zip | |>|Once, there were no checking tools at all, except for spellcheckers |>|built into Word Perfect and Word, which is what most people used, and |>|I could tell you some stories about having to convert those! |>| |>|David Price and Martin Ward and I made checkers that we used for |>|ourselves. There may have been others, but those are the ones I'm |>|aware of. Everything else was Mark One Eyeball. |>| |>|I had done a lot of cleaning-up work on a lot of texts for various |>|people, and I would then send those on to Michael for posting. They |>|would commonly take hours of work each. In self-defense, I wrote a |>|checker I (later) renamed to gutcheck. When the WWs were formed in |>|2001, I brought gutcheck with me, and we all used it to find errors |>|quickly in incoming texts. |> |>But gutcheck gives 90% plus false positive errors, many hundreds on my |>texts in Yorkshire Dialect, mostly poems. It enforces the American |>language, and American punctuation conventions. It objects to most |>Yorkshire abbreviated words such as t' which occur dozens of times |>in the poems I work on. It also objects to non standard punctuation |>which occur in my texts as an example "? whereas American convention |>apparently is ?" . |> |>Writing as one who has designed, written and sold language software for |>some 20 years (see my web site). The *first* stage in the design of any |>software involving language is how other languages will be treated. This |>is usually done by putting all the features of one language, in a specific |>data structure(s) and/or subroutine(s) which can be used or not as |>required. |> |>All I asked for was a copy of gutcheck with the features specific to |>American removed which should be a very short editing and recompiling job. | |I'm not sure how you define "American", but ALL gutcheck features are |language-specific, one way or another. You really appreciate this when |checking Hebrew or Tagalog! Even the relatively familiar French, |German and Spanish have various punctuation features quite |incompatible with gutcheck's assumptions. I'm talking with various |LOTE producers about language-specific versions, but have not yet |decided to take any action. Then gutcheck should be modified to have versions for many languages. If you read the Subject of this thread, you will find: "Language free version of guiguts?" | |>Worse the only way to view output is on a screen. Copy does not work so it |>is impossible to copy the output to a text file and edit the repeated false |>positives out of the list. It is totally unacceptable to distribute a GUI |>program where the standard Copy and Paste functions do not work |> |>Worse still and absolutely ***unforgivable*** in any GUI program the |>settings places the settings file on ***THE DESKTOP***. Deleting it loses |>all settings. |> | |I can't comment on GuiGuts. As a command-line guy, I don't use it all |that much, except sometimes, when I find some specific feature |invaluable. If you want to comment, the appropriate place is in the |GuiGuts thread of the Tool Development forum at DP, which Steve reads |and answers questions and requests in. |http://www.pgdp.net/phpBB2/viewforum.php?f=13 I have asked the question here. I do not do forums. |>|Up till then, there was really no difference between DP and Other |>|texts, though because the people who mostly submitted from DP were |>|experienced, and because DP favored simple texts |> |>DP is by its nature not suitable for my texts, because the language is as |>different from American as say French. A non Tyke (yorkshireman) as has |>been shown in the past, has extreme difficulty understanding the text. | |Well, considering that they regularly do several languages, I doubt if |Yorkshire dialect would stand out much. Right now, in round 1, I find: |English, German (math, with LaTeX), Finnish, French with Scots, Middle |English, Middle French, Portuguese, English with Ancient Greek, |Spanish, Italian, Dutch, German, English with Breton, French, Tagalog, |Latin, and I just know there's some Esperanto around somewhere. I know |they've also done Irish (sean-litriú), because I had a hell of a time |finding all the correct characters for the UTF-8 version (and I'm |still not convinced about Tironian-et). Of course, if you want real |variety, you need to hit the European DP. | |>|And GuiGuts and gutcheck have accreted features ever since. If you |>|have GuiGuts, then you have gutcheck, since Steve bundles it with |>|GuiGuts -- and you also have a large number of other tools that may |>|or may not be useful for the particular text you're working on. |>| |>|There are many other checkers available as well, and I'd love to |>|ramble on about them, but this is too long already, and it doesn't |>|bear on your question. |> |>|This is how it comes -- by evolution, not by fiat -- |> |>Untrue! |>I am *forced* to use guiguts/gutcheck by the Whitewashers. | |I say again: not everyone does. Just eradicate all mistakes and nobody |will ever know what you used. | |>Gutcheck does not work on Windoze. | |It runs in a Win32 command prompt, but it doesn't have a GUI on any |platform. "You have to be joking MAN" | |>| that incoming |>|texts are checked with _several_ tools, according to what seems |>|appropriate for the text, but most commonly with gutcheck and/or |>|GuiGuts. |> |> |>Finally guiguts is as it stands unusable on my texts. No doubt I will find |>other equally drastic problems |> |>As all my work goes on my own web site, and gets copied from there onto |>many other sites, PG is just a nice add on and could be ditched if it were |>to take too much effort. |> |>The text which WW objected to so strongly has been on my site for a couple |>of years, and absolutely *nobody* has noticed the ?errors? People read it |>for the dialect, not the punctuation. I have however had several |>appreciative emails. | |Well, I'm very familiar with that condition, but that's a whole |'nother argument. A text does not have to be perfect to be valuable. |We have many older texts, especially, that have many errors. That |doesn't make them useless. I handle most of the errata reports for PG, |and nearly all of then express appreciation for the availability of |the text, along with their handful of reported errors. I may find |another hundred or so problems when I check the text out, but these |readers never noticed them. Two million downloads a month, with (I |estimate) about one million errors among 17,000 books, and we get |about one errata report per day. | |And there are many people who do want to make etexts but don't want |to live within the constraints of PG -- some don't want the |quality-checking, some complain that we don't quality-check enough, |some don't want to work in plain text, some don't want to go through |the clearance procedures, and so on. | |We have 40 to 60 submitted texts in the average week, and three WWs |active to take them at the moment. If everything in an incoming text |is perfect, one of us will spend about an hour on it. Plus a load of |time on other activities. We can't accommodate everyone on everything, |and there is no doubt that the quality gets higher as time goes on, |because of the processing that we do. This is what we have to do, |to keep the operation moving and the quality high. Not everyone is |going to be happy with the process. Some will choose not to send their |texts to PG. I'm sorry about that. | |>|(Now tell me that all you wanted was the -t switch. :-) |> |>I am not going back to the bad old Unix days, when each program had to be |>learned individually. Come back Bill Gates. All is forgiven. | |Well, I say again, if you don't want to use it, you don't have to; |not everyone does, and especially not everyone does for all texts. |It's essentially a collection of regexes, selected to give, on |average, the best results for the most common type of PG files. |Many |DPers who work on other types of texts just put together their own set |of regexes, and run them through GuiGuts or GutWrench or from a *nix |command line, whichever they prefer. I do not do windoze programming. You are essentially saying that a non programmer can work for PG. :-( Did you really mean that? You have agreed with me above that gutcheck is the standard which must be passed to get. I am just trying to find a version of that standard which will run on my machine, with the text As I understand it that answer to a perfectly reasonable request see Subject from PG was: ****************** ***GET STUFFED.*** ****************** I will look for a workaround. -- Dave Fawthrop <dave hyphenologist co uk> 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/