
marcello said:
Are these scans online and accessible at DP ? If so, linking them from the PG catalog would be a matter of a few hours work, assuming I can get a list of etext-no => page-scan-url
would that it were that easy... *** not to pick on marcello, since i wouldn't do that, because i know he does a lot of work on the p.g. site, so he's busy with other things... but... this thread would be a lot more productive if people would familiarize themselves with the actuality of the scans on the d.p. site... some of them are not of very high quality. many of them are not very well-organized. (although it might not appear that way to the naked eye, that last sentence is the very model of understatement.) *** juliet said:
Aside from not having the development resources to set up some kind of system for accessing and using the scans, we also have not yet found someone who will wade through all the archived material to sort it out so that it can actually be used.
i heard jon noring volunteering! just three posts back! really! :+) *** david said:
I've looked at a similar system on Project Runeberg, but it doesn't look very successful at outputing a number of accurate and complete etexts. DP does a very good job at keeping attention focused on a few texts and keeping them moving forward page by page in the system, where as those system seem to disperse the effort over a lot of books and a lot of pages that are corrected more or less at random.
that's a good analysis. of course, it's also important to note that neither of these projects has volunteers anywhere near the numbers that d.p. has. if they did, their performance would be better; they'd be more focused, and get better results. note that the general problem here -- which is "how do you know when each page is _done_?" -- is one that distributed proofreaders has too. but because of its numbers, and the dedication of its proofers, d.p. has the luxury of answering "when it has gone through x number of rounds." even though that's not the right answer to the question -- the right answer is "when no more changes are being made" -- by the time you've hit 4 rounds (plus post-processing), the odds are much better that all the errors were found. another difference between all of these systems is that the d.p. interface is remarkably better than the others. furthermore, the post-processing apps over at d.p. are some of the best around right now, so that helps too... but if you built a system of continuous proofreading that had an interface as good as the one over at the d.p. site, with as many volunteers spending as much time, and with as much care, it would do as good as d.p. does. or better... still, d.p. _has_ the volunteers, right now, working hard, so there's little need to channel them to another method. they've got a nice little cult there, i mean "community", :+) and as long as they're happy, we should just cheer 'em on. and since there are very few calls -- outside of jon -- for the kind of "transparency with the source materials" that putting scans online would serve, i don't see much point in doing it yet. eventually, diskspace and bandwidth will be plentiful enough to do that without any reservation; but that day ain't here yet. indeed, distributed proofreaders doesn't even leave its scans on brewster's internet archive after that book has gone through proofing, simply because they don't have enough diskspace to do that. it's true. the more important issue here is one that jon is trying to slip under the door while he makes a big noisy distraction -- his emphasis on a complex and heavy form of markup... jon wants you to believe this heavy markup is necessary to deliver a whole bunch of benefits that he will promise. but he has no way to deliver those promises. and i suspect that when he gets deep enough in the fertilizer he's pushing, he'll find his complex systems start tripping over themselves. for instance, even some of the most diehard of x.m.l. people are now saying that x.s.l.t. is too complex for many purposes. or let's look at inter-document linking, which is one of the things that jon likes to say is on his agenda, while he's waving his hands. this is an arena that many good people have wasted a lot of time on. i say "wasted" because either one of two conditions must apply here: 1. documents are permanently available at a u.r.l., or 2. they are not. if condition one is the case, anyone can build an inter-linking system that works perfectly. and if condition two is the case, _nobody_ can. oh, they'll make you _promises_. and they'll be able to build something that _kinda_ works, _most_ of the time anyway. but anyone can do that. so think long, hard, and very carefully about adopting heavy-markup... (but hey, if you _are_ gonna do heavy-markup, _when_will_you_start_? i came to this listserve over a year-and-a-half ago, asking you that, and you _still_ haven't gotten the ball rolling yet. what is the delay, folks?) -bowerbird