
Collin wrote:
Jon:
And yet you seem completely unable to do this. I wonder why? In the time you took to write this lengthy e-mail, you could have set up a page scan archive at TIA. If this is so important to you, why haven't you done this already?
This is an interesting question. My answer is another question: Shouldn't the decision to archive and make the scans publicly available *alongside* the digital text versions be a collective decision among the PG/DP folk? After all, making the scans available alongside the texts requires those maintaining the PG index to provide links to the scans. It also requires metadata and other types of coordination with DP and those independent of DP who use the scan/OCR process for transcribing books. Someone stepping forth to provide a home for the scans will not change how PG/DP does their thing so long as there is no collective decision that it is a good thing to do, and willing to at least help make the preserving process go smoother. I'm trying to provide some rationale on the pro side for preserving and making the scans publicly available alongside the text versions -- whether the rationale will be accepted or not in a collective sense is another matter. I'll be very happy to take the time and effort to solicit/collect/make available the scans when there is a majority consensus by both DP and PG folk that this is important to them, and that PG will provide links to the original page scans (wherever the scans will reside.) (Well, I'll even offer to go ahead with this if two or three others who are higher-level volunteers in the DP system, who are familiar with it, believe it is a good idea and step forward, offering to actively help in the effort to collect/solicit/catalog/archive the scans from the DP work flow, as well as those done outside of DP. If so, then I'll contact IA, probably Molly first, and see if we can setup something official. It could be called the "PG/DP Scan Archive" or some similar name.) To restate what is being discussed, we each will take one of two basic positions: 1) The scans should be made available to the public alongside the structured digital texts. 2) The scans should not be made available to the public alongside the structured digital texts. (Archiving the scans is a related issue, but not the same, since some may take the position that making the scans publicly available should not be done -- e.g., it is considered a waste of effort and disk space by those maintaining the PG catalog -- but that the scans should be preserved somewhere for internal future access by PG/DP volunteers.) If the majority of the volunteers who have lead roles in PG/DP embrace #2, then it is a lot more difficult for any single person to be proactive and do it on their own since the effective preservation requires the systems (work flows) to be more friendly to preservation and availability (e.g., procedural requirements to aid in collecting the scans with sufficient metadata for identification/correlation to the structured digital text work product.) They are not at present. (Btw, just to note. In a conversation with Juliet a few months ago, she noted that some scans cannot be released to the public because of agreements with the scan providers or those who hold the original paper documents. Although this is unfortunate, at least the scans are available to make SDTs, which is better than nothing -- it is always possible to secure scans from another copy of the Edition at a future time provided the full catalog information is preserved, which it is at DP. Now, let's add to this fact the need for the scan preservation activity to acquire metadata conformant with DP's internal metadata tracking (otherwise it is more difficult to correlate a scan set with its associated SDT, both at DP and at PG.) These two facts alone require that the scan preservation effort should have fairly high level cooperation and blessing from the DP people -- it cannot be done without their consent and without some minimal help. Not to mention that downloading the scans from DP will increase the stress on DP's servers, so from that consideration DP also has to 'bless' the effort and provide procedural requirements. So no matter how one looks at it, the DP leadership has to bless the activity and provide some help to make it work. And they will only take the effort to "bless" such an activity if they believe it to be worth doing. Thus I am discussing the "why it should be done" first, which in my book is the proper order in decision-making.)
Jon, at the moment you come across like the nth of the Vapourware Kings that are regularly trolling this board. "Why don't you do X? Any idiot could do X in two working days!" Now I know you are not a Vapourware King, so what's with the act? The most likely reason why we have no page scan archive is because no-one has taken the time to set it up.
Well, maybe I do come forth this way. But then you are saying no one should share thoughts and ideas for reasoned discussion *before* any sort of collective decision? That no one should bring up discussion regarding the basic goals and approach of PG and DP? So what says you (to everyone reading this)? 1) Should the original page scans be made publicly available alongside the structured digital texts? 2) If not, should they at least be preserved with limited or separate access (such as donate them to IA for IA to do as they wish)? 3) Or should the scans be erased when the SDT is proofed and out the door?
DP's page scans are accessible to anyone with an account. (Probably even to those without an account.) The only hard bit is knowing which PG posted text goes with which DP text ID, so that you can recombine them when necessary. I believe we even save bibliographical data with our texts, so that you could extract all kinds of metadata to go with the pagescans.
I have accessed the page scans at DP (in order to submit some page scans before the recent DP revamp). Now, if DP had a policy/system where page scans were more carefully indexed as you mention, then it would certainly be easier for someone to collect them. However, I think the issue is more with PG which actually makes the texts available online (DP focuses on producing the texts -- that the scans are an important part of the DP work flow involves them, too.) Will PG provide links in its catalog to the original page scans alongside the SDT versions? To better understand some things myself, I have to ask a fundamental workflow question of the PG-side of the house. If a finished text is donated to PG and original page scans are submitted alongside the text, what will happen to the scans? Will they be made publicly available alongside the SDT, or will they be preserved but not linked to or made public, or will they be rejected and essentially erased? And another question for PG: In a philosophical sense (ignoring the technical/administrative realities for the moment), would PG, in its catalog, provide links to the original page scans used as the source for the cataloged digital texts? Or is there a philosophical reason why PG would not do this? I've yet to hear an answer as to whether PG will philosophically consider providing links to the original page scans used to produce the texts in its catalog. Jon Noring