
just because i had aroused my own curiosity, i actually went and looked at a couple books, even though i desperately want to avoid this entire conversation of unnecessary duplication. but i looked. sure enough, the reality is as bad as i thought. in fact, on digging deeper, yes, it's even worse. *** let's start with huck finn... the current file, #76!, was done by ron burkey, from an original done by an anonymous person, and david widger is top-listed as the maintainer. *** let's detour a bit, to talk about david widger... he's a whitewasher, and he's one of a handful of volunteers who has given _years_and_years_ of good and solid service to project gutenberg. if anyone can feel proud of their contribution to project gutenberg, it would be david widger. he has handled literally _thousands_ of e-texts as a whitewasher, and digitized many of those by himself. david is one of the _heroes_ of p.g. indeed, although such things are impossible to judge with any kind of certainty, it would _not_ be a stretch of the imagination to assert that david widger has given more than anyone else -- _anybody_ -- to project gutenberg. period. that's how far up on the totem pole david is... and any course of action has to _respect_ that... which is not to put a halo around the man's work, because i will soon describe that work objectively, but it _is_ to remind everyone that we still need to respect the man behind that work, and the depth of the long-time service he's given to the project. we can criticize the work and still respect the man. not that my intent is to set out to "criticize", per se. again, i am just describing his work _objectively_... so if i get something wrong, feel free to correct me. *** so, let's look at "huck finn" in project gutenberg... the original is credited to "dell@wiretap.spies.com". its post-date says 29-jan-2002, but the file itself says it was "released to the public july 1993", so... your guess is as good as mine. that's hfinn10.txt. hfinn11.txt has a post-date of 13-aug-2002, and this is the one which was prepared by ron burkey. his version dehyphenated the end-line-hyphenates, and also reformatted the paragraphs, owing to that. hfinn12.txt was posted on 15-may-2004, and this is the version where david widger came on-board. his major contribution was to add an .html version, which was an accomplishment back in those days. with updates coming so frequently, it didn't matter that this e-text was actually still rather primitive... for instance, it was still using uppercase for italics. (believe it or not, it was only at the celebration of the 10,000th e-text -- held in december of 2003 -- when p.g. _committed_ to actually marking italics, and it took a year or two for the promise to kick in, just so you have the historical knowledge necessary to grok that such primitiveness was common then.) but that's when the problem set in. because the frequent updates to this book stopped. the 2004 file sat on the shelf, and sat on, and on. tons and tons of downloads. tons. but no updates. and as time went on, its primitive nature became more and more obvious, more and more grating. *** finally, jim decided to update our good old huck, and it was posted with a date of 10-may-2010. and, to reiterate this part of the story, the update was posted _not_ as a replacement for number 76, but as a completely new e-text, numbered 32325. and, as of just now, according to the p.g. website, the old file has 16,537 downloads, the new 528.... (but who knows what those numbers really mean?) it certainly appears that david widger didn't want to share slot #76 with work that overwrote his own... i'm _not_ saying that i _blame_ him for that action. i'm just paying attention to the download numbers. the "new improved" file isn't getting any traction... so that's the problem, in a nutshell... *** but that's not the end of the story. unfortunately. because, on 31-aug-2012, #76 was _updated_ too! that's right, it was updated just over a month ago! maybe there _is_ hope! :+) (there might have also been some kind of action in february of this year, perhaps revolving around that ill-fated mess greg was proposing back then, but we'll just pretend that none of that happened.) so, an update to huck finn #76 is good news, right? well, you might think so, but if you look, actually, no. because, believe it or not, #76 still has _no_italics._ that's right, a p.g. e-book updated just a month ago, is _still_ using uppercase to indicate italics. bullshit! even worse, if you take a look at the .html source, you'll see that the headers are tagged as paragraphs! so much for "tagging the structure of the document". and this is from a whitewasher, for crying out loud! on a classic book, with a number inside the first 100, which is in the top-10 list of downloads, getting tons. something is most definitely wrong with this picture. plus, as i said, i seem to remember that jim said that he had found _hundreds_ of mistakes in #76, and even that he'd actually posted a change-log... but i couldn't even bear to look to see if the errors had been corrected in this "updated" file, because i had a very strong feeling i would find they hadn't. at any rate, the message needs to be repeated: something is most definitely wrong with this picture. *** i also looked at "pride and prejudice", obviously, since i had just completed doing my own update. the history for that e-text is remarkably similar... various versions (09-12) and dates (2002-2006), but then an updated version(s) appearing recently, with dates of 28-nov-2011 and 12-may-2012. and a closer look reveals the same sad situation... the one positive exception is that this book _does_ have the chapter-headers tagged as headers! yay! with the italics properly marked ever since 2006! chalk up victories, no matter how slight they seem! it's not a triumph for structural tagging, however. there are several letters in this book, and they are tagged as plain paragraphs. the only distinction they receive is that the salutations and sign-offs get a "center" tag, hardly a very significant tip-off. so i was tempted to just write it off with that, but for this book, however, i simply had to look to see what changes were made the most-recent "update", given that i just spent a week poring over that text. besides, maybe david had spotted an error i hadn't, and thus my text would improve by analyzing his. so i decided to look at the changes in the update. not that it was all that easy to "look", mind you... because this revised e-text had been _rewrapped_. thus, a simple comparison couldn't be done easily. but i expected that if a rewrapping was required, then it meant that many changes had been made, and i was eager to see exactly what they'd been... so i jumped through hoops to set up a comparison. which means that you can probably understand that i was quite disappointed to find a mere 53 changes. moreover, most of 'em were extremely minor ones, nothing that would necessitate a rewrap, which led me to wonder if the rewrap had been done so as to _hide_ the fact that so few changes had been made. i can't make the charge, of course, but i did wonder. but the news gets even worse. because when i stepped through those 53 changes, i discovered that some of 'em were actually incorrect! indeed, around 20 of the 53 were _improper_edits_... (a full 16 incorrectly revised "upstairs" to "up stairs".) so, as with huck, this is bad. it is embarrassingly bad. so, um, i don't think i can look at any more "updates"; it's just too depressing. i am at a loss for anything else to say, so i will close... -bowerbird