
jon said:
my hope is that all the PG powers-that-be are part of PG for the right reasons, that they recognise that a problem exists, and that they are keen that it gets fixed.
you can grant them all of that. but none of that means that they are going to agree with you that you have the right solution. and they have the keys to the castle.
From what you say, it cannot get fixed without buy-in from the WWs and Marcello to a specific course of action, so the problem is really one of generating consensus around a course of action.
right. and that hasn't ever happened here. ever. and it is clear that your plan will not do it either. i mean, we could all agree to re-do some e-texts -- without committing to your ms/rtt plan at all -- but the new e-texts would still be ignored by users. so you're not paying attention to the _real_ problem.
with an MS, an RTT and Knuth's genius I can do a solo project that shames Amazon in a day or two.
well, there's another problem. you wanna use tex. go over to d.p. and see the tex contingent there. it's tiny. the learning curve is obviously too hard.
What I can't do is create the MS and RTT by myself.
you're wrong. a scan-set is easy to obtain. and getting the text correct is not difficult. so, you know, go ahead and do your demo. do it for one book, e.g., "pride and prejudice". get the text from that pointer i gave to my site.
There seems to be a consensus around a 10 book pilot forming. Success would mean that 1 in 20 downloads would be of improved quality. Sounds good to me.
you're counting chickens when you don't even have any eggs yet.
DP co-operation may not be as crucial as I once thought.
it's not "crucial" in the slightest. it would be a hindrance.
I think it is worth at least _trying_ to bring them on board.
you're biting off more than you can chew as it is. and now you want to stuff a big decaying head in your mouth as well? d.p. stopped growing a very long time ago... soon they will start falling from their plateau. you already have an albatross around your neck. don't try to tie an anchor to your ankle as well...
I fully admit to having precisely zero knowledge or experience regarding book editions and availability of scans. I naively thought that "someone" would know which edition a given extant text was derived from and that it would be straightforward to find a decent scan given that knowledge. This is cleary very wrong.
sometimes it is easy. sometimes it is not. i've been unable to find any scan-set for the text used for the p.g. version of "pride and prejudice".
That we don't necessarily know the edition of the extant text raises the issue of which edition we lock in to. Who makes that decision?
that's a non-issue. get a scanset from the internet archive, which will give you the o.c.r. as well. you correct that o.c.r. by comparing it against a clean text, resolving the diffs.
I conclude that Don is absolutely correct. The initial action of a 10 book pilot should involve nothing more or less than determining, in each case, which edition and hence which scan we are going to work with.
ok, start with _one_ book, not 10. seriously. one. because you still have _no_ word from a whitewasher or marcello that _any_ change will be made to the site that will allow a corrected e-text to float to the surface. until p.g. gives some indication that it _truly_wants_ corrected editions, you are simply wasting your time. if i were you, i'd make 'em prove their intentions by bringing the books jim re-did to higher prominence.
We need someone knowledgeable to guide this selection process and if necessary make a final executive decision in each case.
again, you're looking for some type of agreement that has _never_ever_ happened on this list before. besides... there's no reason you can't do _multiple_versions_ of a book. again, take my "pride and prejudice". i took one e-text and diffed it against another one. that pointed out the discrepancies between the two, which were either (1) an o.c.r. error in one or both, or (2) edition differences. you decide which it is by comparing each of the e-texts against its scan-set. at the end of the process, both e-texts are correct, with the caveat that both could conceivably have the identical o.c.r. error located in the same place. (although my research shows that to be very rare, quite less likely than one would think it might be.) being able to present multiple editions, and generate pointed evidence of the changes made across them, is an empowering thing, one that might be considered to be of considerable value by some people out there. scan-sets are plentiful these days. there's no reason to think that we need to limit ourselves to one edition. (having said that, however, i see _no_ useful purpose served by an e-text that _cannot_ show its provenance in a solid demonstrable way by pointing to a scan-set. but that's a point that i have made many times before.) *** greg said:
This is exactly opposite the PG policy. We specificaly do NOT adhere to any print edition.
see what kind of lunacy you are up against, jon? a policy that once made sense was retained until it no longer produced any solid benefit, and then _retained_even_longer_ as it became a liability, and now _is_still_retained_ even when it is stupid. heck, these people can't even _spell_ "specifically".
(That is part of why you will find it really hard to find a matching print source for many PG eBooks.)
which makes it very hard to submit an error-report that the whitewashers can't reject if they want to... now we are in the sad situation where the world is awash in p.g. e-texts which have zero provenance. if this doesn't change, there will come a time, and it's not far down the line, where project gutenberg will come to be considered a _liability_ to e-books, an example of _how_not_to_do_it_. how depressing. -bowerbird