
First, some general musing here... It's interesting to see how as the number of people involved with PG in one way or another keeps growing, so does a general misunderstanding about the nature of the project. Somehow, the impression of some people is that it all runs like clockwork, and all little ambiguities are swiftly and efficiantly dealt with. I can understand that when someone sees the sheer amount of what has been accomplished so far, it can be easy to assume that "of course, _this_ has been done--it wouldn't make sense otherwise." Realistically, the processes that are in place grew up over time, with volunteers doing their best to deal with the demands of the moment and string something together that would work. And it is not static either, it keeps changing. I'm not kidding when I say that the few people who do the majority of the back-end stuff that keeps PG growing have a backlog of years of PG-related tasks to tackle. So, on the specific topic at hand... On Thu, 19 Jan 2006, Dave Fawthrop wrote:
You have just admitted that gutcheck is the standard on PG.
From the cataloging point of view, I've regularly had help from native speakers of various languages (Finnish and Tagalog spring to mind) which has helped me to make bibliographic data more
Yes, it is used a lot. Often it's a very useful tool (sometimes even for non-english texts.) However, I would not call it a standard in the sense of being a "test" that a given text has to "pass" (such as a test for valid markup on an html file). Rather, it is a tool which just about every text being added to the collection is run through, as a way of 1) assesing the over-all level of the text, and 2) guarding against last-minute gremlins that do unexpected things to a text (and yes, interesting things do happen sometimes.) I have submitted some German and French texts to PG which I have reformatted from other sources, and, as expected, a run through gutcheck resulted in many places being questioned that were just fine in the given languages. So, if I thought it needed, I just added a note when submitting the texts that "gutcheck flags a lot of false positives on this one." It looks like the source for gutcheck is availible at http://gutcheck.sourceforge.net/ if you are interested in modifying it for your own uses. (If you are just dealing with one or two texts, it might not be worth the bother, but if you foresee working through lots of Yorkshire text, it could be more worthwhile.) ...... So, will the conditions I discussed above change? Well, PG is certainly more organized in some ways than it used to be, and I could see it going further in that direction. However, I don't realistically see it ceasing to be run by volunteers, which does set some of the tone. I'm not pretending that I think PG is perfect here. Like anyone else who is involved, I have my own issues (one of my pet peeves is if the stated character encoding in the header does not match what is actually in the text), but I know they will not likely be dealt with unless I go ahead and try to work on them. I've found a good approach is building consensus with others. precise than I ever could have managed on my own. As well I've occasionally sent queries to the reference desks of libraries in many corners of the world. If I can get it organized, I'm hoping to make a sub-project where I can target a few wikipedia users who have indicated they have fluency in both English and Chinese, and give them a way to help improve the consistency of the author data for our Chinese texts. I'd better stop now, before I meander off-topic too much... But I hope this has helped somewhat. And thanks for caring about Project Gutenberg. :) Andrew