Re: [gutvol-d] PG audience

12 Nov 2004

      Marcello wrote:
...
Karen Lofstrom wrote:
...
If you cite a dead tree edition of something you are quite confident 
that the cited text stays put. It wont change its wording or glide from 
the cited page into the next etc.
If you cite an electronic resource you have no such confidence. How do 
you make sure that the text at the url you cite will not be edited or 
removed? You cannot. How do you make sure the medium you cite will still 
be readable in some years? In a hundred years reading a CDROM may be 
harder than it was to read the rosetta stone.
Actually, this issue is dealable using hash functions. Once a digital
document is finished and archived, simply calculate a hash value for
it (or the set of files the work comprises.) Use a published, open
standards hashing algorithm -- there's many out there to choose from.

It's also possible to use digital signatures in some manner, but I'll
let the experts in this area discuss this possibility.

Textual integrity is definitely an issue, and it goes beyond just
keeping academics happy -- it is germane to the perceived integrity
of the entire collection of texts by society-at-large. By keeping the
page scans along with the digital texts, we are, in effect, telling
the users of the digital texts that we fully stand by the textual
integrity of the collection, that we did not pull any fast ones, and
that it can be trusted. We are putting our reputation on the line.

With using digital hashes and digital signatures, and redundant/
mirrored text repositories, we go a long ways towards assuring the
collection maintains its integrity. As others have noted, some
dictator or totalitarian regime in the future may break into one of
the repositories and start tweaking texts. So long as the whole
world does not revert to totalitarianism (where then we have much
bigger problems than the integrity of texts), then with a properly
designed repository it will always be possible to restore the original
digital texts from a clean, untouched digital repository. Hopefully
individuals will also keep digital texts laying around, but again here
we also need to keep in mind individuals can also tweak the texts, thus
the use of hashing/digital signatures is still needed.
...
...
If you don't want to cater to scholars, you're throwing away much of DP's
work.
...
Its not our problem. Any amount of catering will not do away with 
Academias percieved "limitations" of electronic media.
I don't have such a pessimistic view of academia. Yes, academics are
strange birds. But as the old generation dies, and a new generation
arises, familiar with accessing digital information, they will embrace
digital media with a fervor.

PG can certainly make its texts "academia friendly", or at least
reasonably so. The incremental effort (delta-t) to do the few more
things to make PG texts more academia-friendly is pretty small
compared to the overall time it takes to scan/type/OCR/proof a text.
And many of these added things have other small benefits outside of
academia itself, benefits for other user groups of PG texts.
...
The best value for Academia (and the least work for us) would be just to 
include the page scans. Any transcription you make will fall short of 
the requirements of some scholar. I think we should use our time for 
producing more books for a general audience instead than producing 
Academia-certified editions of them.
It behooves PG to at least reasonably reach out to the requirements
of "academia" (which is not as monolithic as implied) in markup and
metadata, and include the original page scans for every work. That's
all that can be done and should be done.

Making the page scans available has purposes beyond just keeping
academics happy. For example, someone may wish to issue a retypeset
print edition of some work using the XML-based PG texts. Having the
original page scans there to verify document structure and layout
oddities will be useful to those doing final proofing of the output
typography. And as noted above, having the original page scans
available to future generations is a further protection of the textual
integrity of the digital text. It also has the side-benefit of being a
digital preservation of the original source, and this alone is a very
powerful argument to keep the page scans as an honored and integral
part of the PG collection -- it will greatly add value and purpose to
the PG collection. Disk space and bandwidth is no longer an issue
(well, it's no longer a major, show-stopper issue as it was a decade
ago.)

It mystifies me why the original page scans are treated by some here
as some sort of waste product, meant to be flushed down the toilet
when done, or that we don't need to preserve them, or need to have
access to them (I'm still surprised to hear that the scans for some of
the DP texts are not available to the public because of licensing
issues.)

Jon Noring