What do scholars want?

From the prospective of a peasant.
It appears that the most important thing that scholars want is immutability. A dead tree copy of a book can't be changed, so they can go on endlessly about which dead tree copy is "better" than any other dead tree copy (I know where all of the errors are, and you don't, so there!). Even though, eventually, dead tree copies wear out, are burned up in fires, are carelessly discarded, or sold off to make space, etc., etc., they don't change. Therefore, an electronic copy is unacceptable because: 1. Maybe it is not the exact representation of a dead tree copy. This is entirely unacceptable because "my" dead tree copy is better than all of the others. 2. Its URL might change and then I couldn't find it. 3. Worse yet, the URL doesn't change but the text does. (See point 1.) It appears that we need to modify the PG web site to include checksum and CRC data on each of our files to provide a mechanism of verifying that they have not been nefariously modified after download, so "my" electronic copy can be judged the same as "your" electronic copy. I fall back to my earlier point: What would be better when you're submitting research than to include a copy of every item of source material? This is not done with dead trees because we do not have a mechanism to instantly create an exact duplicate of a given piece of material for free in the dead tree world. Such a mechanism does exist in the electronic world. When academia wakes up to this fact, maybe their negativity toward electronic copies will lessen somewhat.

At 07:12 AM 11/14/2004 -0600, you wrote:
It appears that we need to modify the PG web site to include checksum and CRC data on each of our files to provide a mechanism of verifying that they have not been nefariously modified after download, so "my" electronic copy can be judged the same as "your" electronic copy.
Yes, but even CRC, hash or md5 values can be forged. All someone would need to do is somehow compromise the PG server. That has happened with a main Debian and gnu server already. How would we make sure that the hashes are real? One solution is gpg signatures, but then someone needs to download and install gpg, a tool to verify the hash, plus the actual text file. The average user won't know how to do this and wouldn't even if they could. Not to mention that the hash and signature process would have to be done every time one byte is changed in the original, such as for correcting errors.

On Sun, 14 Nov 2004, Tony Baechler wrote:
At 07:12 AM 11/14/2004 -0600, you wrote:
It appears that we need to modify the PG web site to include checksum and CRC data on each of our files to provide a mechanism of verifying that they have not been nefariously modified after download, so "my" electronic copy can be judged the same as "your" electronic copy.
Yes, but even CRC, hash or md5 values can be forged. All someone would need to do is somehow compromise the PG server. That has happened with a main Debian and gnu server already. How would we make sure that the hashes are real? One solution is gpg signatures, but then someone needs to download and install gpg, a tool to verify the hash, plus the actual text file. The average user won't know how to do this and wouldn't even if they could. Not to mention that the hash and signature process would have to be done every time one byte is changed in the original, such as for correcting errors.
Nothing more is needed for this than "compare." This has been discussed widely over the years, and the simple and easy solution, for those who really want to test the files, is simply to get a few copies of the eBook in question from some different sources and test them with any of the various "file compare" programs that come with virtually all operating systems. Thus, even if just one ";" were changed to a ":" it would show up immediately, something that a careful proofreader might still miss. This totally avoids the possibility raise above of forged CRCs or hashes, and eliminated a need for any extra work on eBook preparation. Anyone can run the tests, themselves, without a reliance on outside authorities to tell them if one eBook edition is any different than another, and exactly how different it is. Simple, fast and effective, the way the entire eBook process should be. Michael S. Hart

John Hagerson wrote:
It appears that we need to modify the PG web site to include checksum and CRC data on each of our files to provide a mechanism of verifying that they have not been nefariously modified after download, so "my" electronic copy can be judged the same as "your" electronic copy.
We already have hashes for all our files. That's the way KaZaa and other P2P networks work. We keep more hashes for every file than you may want to know about: md5, sha1, kazaa, ed2k and tigertree. If you go to the bibrec page and hover over the P2P link you can see them. (or copy the link into an editor) But I still don't understand what good it will do you if you know the hash of the "original" file? -- Marcello Perathoner webmaster@gutenberg.org
participants (4)
-
John Hagerson
-
Marcello Perathoner
-
Michael Hart
-
Tony Baechler