
Marcello wrote:
Karen Lofstrom wrote:
If you cite a dead tree edition of something you are quite confident that the cited text stays put. It wont change its wording or glide from the cited page into the next etc.
If you cite an electronic resource you have no such confidence. How do you make sure that the text at the url you cite will not be edited or removed? You cannot. How do you make sure the medium you cite will still be readable in some years? In a hundred years reading a CDROM may be harder than it was to read the rosetta stone.
Actually, this issue is dealable using hash functions. Once a digital document is finished and archived, simply calculate a hash value for it (or the set of files the work comprises.) Use a published, open standards hashing algorithm -- there's many out there to choose from. It's also possible to use digital signatures in some manner, but I'll let the experts in this area discuss this possibility. Textual integrity is definitely an issue, and it goes beyond just keeping academics happy -- it is germane to the perceived integrity of the entire collection of texts by society-at-large. By keeping the page scans along with the digital texts, we are, in effect, telling the users of the digital texts that we fully stand by the textual integrity of the collection, that we did not pull any fast ones, and that it can be trusted. We are putting our reputation on the line. With using digital hashes and digital signatures, and redundant/ mirrored text repositories, we go a long ways towards assuring the collection maintains its integrity. As others have noted, some dictator or totalitarian regime in the future may break into one of the repositories and start tweaking texts. So long as the whole world does not revert to totalitarianism (where then we have much bigger problems than the integrity of texts), then with a properly designed repository it will always be possible to restore the original digital texts from a clean, untouched digital repository. Hopefully individuals will also keep digital texts laying around, but again here we also need to keep in mind individuals can also tweak the texts, thus the use of hashing/digital signatures is still needed.
If you don't want to cater to scholars, you're throwing away much of DP's work.
Its not our problem. Any amount of catering will not do away with Academias percieved "limitations" of electronic media.
I don't have such a pessimistic view of academia. Yes, academics are strange birds. But as the old generation dies, and a new generation arises, familiar with accessing digital information, they will embrace digital media with a fervor. PG can certainly make its texts "academia friendly", or at least reasonably so. The incremental effort (delta-t) to do the few more things to make PG texts more academia-friendly is pretty small compared to the overall time it takes to scan/type/OCR/proof a text. And many of these added things have other small benefits outside of academia itself, benefits for other user groups of PG texts.
The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them.
It behooves PG to at least reasonably reach out to the requirements of "academia" (which is not as monolithic as implied) in markup and metadata, and include the original page scans for every work. That's all that can be done and should be done. Making the page scans available has purposes beyond just keeping academics happy. For example, someone may wish to issue a retypeset print edition of some work using the XML-based PG texts. Having the original page scans there to verify document structure and layout oddities will be useful to those doing final proofing of the output typography. And as noted above, having the original page scans available to future generations is a further protection of the textual integrity of the digital text. It also has the side-benefit of being a digital preservation of the original source, and this alone is a very powerful argument to keep the page scans as an honored and integral part of the PG collection -- it will greatly add value and purpose to the PG collection. Disk space and bandwidth is no longer an issue (well, it's no longer a major, show-stopper issue as it was a decade ago.) It mystifies me why the original page scans are treated by some here as some sort of waste product, meant to be flushed down the toilet when done, or that we don't need to preserve them, or need to have access to them (I'm still surprised to hear that the scans for some of the DP texts are not available to the public because of licensing issues.) Jon Noring