Re: [gutvol-d] improving blah blah blah

jon said:
my hope is that all the PG powers-that-be are part of PG for the right reasons, that they recognise that a problem exists, and that they are keen that it gets fixed.
you can grant them all of that. but none of that means that they are going to agree with you that you have the right solution. and they have the keys to the castle.
From what you say, it cannot get fixed without buy-in from the WWs and Marcello to a specific course of action, so the problem is really one of generating consensus around a course of action.
right. and that hasn't ever happened here. ever. and it is clear that your plan will not do it either. i mean, we could all agree to re-do some e-texts -- without committing to your ms/rtt plan at all -- but the new e-texts would still be ignored by users. so you're not paying attention to the _real_ problem.
with an MS, an RTT and Knuth's genius I can do a solo project that shames Amazon in a day or two.
well, there's another problem. you wanna use tex. go over to d.p. and see the tex contingent there. it's tiny. the learning curve is obviously too hard.
What I can't do is create the MS and RTT by myself.
you're wrong. a scan-set is easy to obtain. and getting the text correct is not difficult. so, you know, go ahead and do your demo. do it for one book, e.g., "pride and prejudice". get the text from that pointer i gave to my site.
There seems to be a consensus around a 10 book pilot forming. Success would mean that 1 in 20 downloads would be of improved quality. Sounds good to me.
you're counting chickens when you don't even have any eggs yet.
DP co-operation may not be as crucial as I once thought.
it's not "crucial" in the slightest. it would be a hindrance.
I think it is worth at least _trying_ to bring them on board.
you're biting off more than you can chew as it is. and now you want to stuff a big decaying head in your mouth as well? d.p. stopped growing a very long time ago... soon they will start falling from their plateau. you already have an albatross around your neck. don't try to tie an anchor to your ankle as well...
I fully admit to having precisely zero knowledge or experience regarding book editions and availability of scans. I naively thought that "someone" would know which edition a given extant text was derived from and that it would be straightforward to find a decent scan given that knowledge. This is cleary very wrong.
sometimes it is easy. sometimes it is not. i've been unable to find any scan-set for the text used for the p.g. version of "pride and prejudice".
That we don't necessarily know the edition of the extant text raises the issue of which edition we lock in to. Who makes that decision?
that's a non-issue. get a scanset from the internet archive, which will give you the o.c.r. as well. you correct that o.c.r. by comparing it against a clean text, resolving the diffs.
I conclude that Don is absolutely correct. The initial action of a 10 book pilot should involve nothing more or less than determining, in each case, which edition and hence which scan we are going to work with.
ok, start with _one_ book, not 10. seriously. one. because you still have _no_ word from a whitewasher or marcello that _any_ change will be made to the site that will allow a corrected e-text to float to the surface. until p.g. gives some indication that it _truly_wants_ corrected editions, you are simply wasting your time. if i were you, i'd make 'em prove their intentions by bringing the books jim re-did to higher prominence.
We need someone knowledgeable to guide this selection process and if necessary make a final executive decision in each case.
again, you're looking for some type of agreement that has _never_ever_ happened on this list before. besides... there's no reason you can't do _multiple_versions_ of a book. again, take my "pride and prejudice". i took one e-text and diffed it against another one. that pointed out the discrepancies between the two, which were either (1) an o.c.r. error in one or both, or (2) edition differences. you decide which it is by comparing each of the e-texts against its scan-set. at the end of the process, both e-texts are correct, with the caveat that both could conceivably have the identical o.c.r. error located in the same place. (although my research shows that to be very rare, quite less likely than one would think it might be.) being able to present multiple editions, and generate pointed evidence of the changes made across them, is an empowering thing, one that might be considered to be of considerable value by some people out there. scan-sets are plentiful these days. there's no reason to think that we need to limit ourselves to one edition. (having said that, however, i see _no_ useful purpose served by an e-text that _cannot_ show its provenance in a solid demonstrable way by pointing to a scan-set. but that's a point that i have made many times before.) *** greg said:
This is exactly opposite the PG policy. We specificaly do NOT adhere to any print edition.
see what kind of lunacy you are up against, jon? a policy that once made sense was retained until it no longer produced any solid benefit, and then _retained_even_longer_ as it became a liability, and now _is_still_retained_ even when it is stupid. heck, these people can't even _spell_ "specifically".
(That is part of why you will find it really hard to find a matching print source for many PG eBooks.)
which makes it very hard to submit an error-report that the whitewashers can't reject if they want to... now we are in the sad situation where the world is awash in p.g. e-texts which have zero provenance. if this doesn't change, there will come a time, and it's not far down the line, where project gutenberg will come to be considered a _liability_ to e-books, an example of _how_not_to_do_it_. how depressing. -bowerbird

On 2012-09-24, Bowerbird@aol.com wrote:
From what you say, it cannot get fixed without buy-in from the WWs and Marcello to a specific course of action, so the problem is really one of generating consensus around a course of action.
right. and that hasn't ever happened here. ever.
and it is clear that your plan will not do it either.
In which case my plan will be dead in the water. Nothing will have been lost, but nothing will have been gained. We can all agree that there is a problem that needs to be fixed. We can all agree that this problem can only be fixed if there is a consensus on a course of action -- going off on your own acheives precisely nothing of consequence. A consensus has never happened, and this has resulted in an obvious problem not being fixed. Therefore, before we can do anything else, we need something that has never happened before to happen. No one dies if it doesn't happen, but it would be nice if it did.
so you're not paying attention to the _real_ problem.
My understanding is that what you refer to as the _real_ problem is how the default version for each final format is selected. If so, I agree. If a redo is not the default _at_Project_Gutenberg_ it effectively does not exist on the Internet. But how do we decide that a redo is objectively superior to the original? There needs to be an agreed protocol on this before there is the slightest point to rolling up our sleeves and digging in. I guess agreement of such a protocol is what you mean by buy-in from Marcello and the WWs.
well, there's another problem. you wanna use tex. go over to d.p. and see the tex contingent there. it's tiny. the learning curve is obviously too hard.
_I_ want to use LaTeX, because I use LaTeX a lot and the book I piloted using it turned out really nicely. I don't care what anyone else uses -- lots of things work. I just want to make it as easy as possible for you to do a ZML version of something I've done a LaTeX version of and vice versa.
What I can't do is create the MS and RTT by myself.
you're wrong. a scan-set is easy to obtain.
and getting the text correct is not difficult.
But agreeing on which scan-set should be used _is_ difficult. Yes, you can of course do a heap of different editions, but you can expect any edition which is not the PG default to effectively not exist, so why bother doing it? Certainly diffing an OCR against a clean text will improve the clean text. Maybe this will be enough, maybe it won't.
so, you know, go ahead and do your demo. do it for one book, e.g., "pride and prejudice". get the text from that pointer i gave to my site.
To what end? It would just be a solo project, and I know that I can produce what I would consider a high quality book as a solo project. Without an agreed MS and a default download protocol, I would acheive precisely nothing. Everyone seems to agree that these things are the current show stoppers, so it would seem more productive to see if there is anything I can do to help the powers-that-be get these things in place.
until p.g. gives some indication that it _truly_wants_ corrected editions, you are simply wasting your time.
Agreed. An MS agreed and uploaded for the top 10 books and a protocol that states that a version derived from the MS _will_ become the default download would be that indication for me. I'll help any way I can. Cheers Jon

On 9/25/2012 3:29 AM, Jon Hurst wrote:
On 2012-09-24, Bowerbird@aol.com wrote:
[snip]
so you're not paying attention to the _real_ problem.
My understanding is that what you refer to as the _real_ problem is how the default version for each final format is selected.
No, the real problem is that Project Gutenberg, or at least those with any influence at Project Gutenberg, are emotionally invested in a process that is 30 years old, and at least 20 years out-of-date. The problem is a political one, and not a technical one, and trying to apply technical solutions to political problems is like trying to teach a pig to sing: it wastes your time, and annoys the pig. My advice: do what ever you want, but document very thoroughly what you're trying to do, and how you're going to do it. In my experience, Dr. Newby has always been very forthcoming with providing hardware and network (i.e. technical) support. Then invite anyone who wants play by your rules to join you. If your documentation is clear and complete, and if your case is compelling, you shouldn't have too much difficulty attracting help. This is exactly what BowerBird has attempted, so you can judge your chance of success by his. In my mind, about the only thing of value that Project Gutenberg currently possesses is the trademark of the Project Gutenberg name. The one thing you /might/ get out of Project Gutenberg is permission to use the name as part of an incubator project. In fact, working with the PGLAF to build an incubator system that allows for use of the Project Gutenberg trademark across many projects would probably be far more valuable that simply another attempt to create a master format using your preferred markup language.

In my mind, about the only thing of value that Project Gutenberg currently possesses is the trademark of the Project Gutenberg name.
PG also has customers. PG claims about 3 million downloads a month -- although I wonder how many of those downloads actually get read.
participants (4)
-
Bowerbird@aol.com
-
James Adcock
-
Jon Hurst
-
Lee Passey