
Michele "Her Serene Highness" wrote:
[snip of excellent comments]
**Why not? It's done all the time. Students and scholars have cited rare books that are impossible to find before- I remember citing a rare book that contained the concordat between the Vatican and Germany for a grad class years ago, and information on the Black Star line of Marcus Garvey while still in high school. Why did my professors accept my citations? Because they could be tracked down. It wasn't impossible to find the originals- just difficult. The former one was located in Bobst Library at NYU and the latter was in the NY Public Library's Schomberg Collection. I can find both of them more easily now, because both libraries have their catalogues online. That menas I can find the cites and then go look at the actual books. Since there is no physical book with PG that an outsider can hold, it would be nice to have a master scan of the text. PG isn't meant to be a master text- it's a repository for copies. But copies come from somewhere.
The above comment suggests two basic requirements PG should embrace for all texts: 1) The original source (or sources for composite works) is fully identified and described in the metadata using accepted library cataloging standards, and that these fields are searchable. 2) The original page scans also exist in the database, linked to and from the digital text version (easy to do in XML -- TEI has markup for this purpose.)
I happen to love PG- but it will be in ideal form when it has hyperlinks to other books and to the notes I type up, when I can print it out and have it paginated, when I can tell if I'm reading a facsimile of a first edition. I know that's a lot, but a girl can dream. amd some sites are doing that kind of thing with individual books already- but their scope isn't as large as PG's. PG's scope is what makes it valuable, but I wouldn't use it foor scholarly work.
The ability to annotate, reference and interlink texts within a digital text repository are very powerful features. The fundamental architecture of the "PG Library System" should include this as a future possibility. To me, this is even more exciting than some of the other things being considered, such as language translation. The requirements associated with these features strongly point to formatting all PG master texts in XML. W3C's XPointer can be used to address both spots and ranges within an XML document using several schemes (both W3C defined and custom schemes within the XPointer Framework.) The most common and most robust/persistent scheme is the well-known fragment identifier. But there's also a scheme to point to a particular element (tag) in a document which does not have an 'id', as well as to point to a spot within content (this scheme is still in Draft form -- it is not a W3C Recommend.) So long as the XML document remains unchanged (and for the fragment identifier scheme where the 'id's are kept unchanged even if changes are made to the document), the XPointer addresses will still work. (The term used here is "persistence".) One problem area, which gets into Identifiers, is how to address the XML document itself -- can it be addressed "standalone", or must it be addressed only when it resides within a repository (such as the PG Library)? If the XML document can be addressed standalone, apart from the repository, then obviously it must internally contain an identifier, the same one used to identify it within the repository and which forms part of the URI reference. It was an interesting exercise last year when the Open eBook Forum's Publication Structure Working Group spent three months studying how to reference and interlink OEBPS Publications, and how to address particular spots and ranges within particular XML documents within a Publication (OEBPS allows multiple documents to comprise one Publication.) Of course, complicating things, which may be less of an issue for PG, is that we wanted the linkability to persist even when the OEBPS Publication is converted to something else, provided the converted format can contain the relevant internal pointers. In this study, Identifiers became a Significant Issue (tm). PG will need to come up with a viable identifier system and specialized URI syntax for using XLink. For many of you, the above is probably all Greek. But if one wants to enable annotation, referencing, and text interlinking within the PG Library system, then this will put constraints and requirements that need to be considered. One workable solution is where all the texts are in XML, and one uses these cool technologies called XPointer and XLink to enable these features. Fortunately, it appears the "powers who are" have decided upon moving someday to XML for the PG Master Texts.
One person made the comment that PG shouldn't try to anticipate what scholars want- it should let scholars discover it and let them say what they need. I just did, and most of what I'm hearing is that I have to learn to adapt to PG, when there are perfectly good college libraries out there. There is no reason for scholars to embrace a site that doesn't even meet up with basic MLA guidelines for books. After all, that's the business you are in- not original websites, but books.
Michele's point is that before PG makes any substantive decisions, it needs to decide upon which user groups it would like its texts to target (the more the better in my opinion), and then ask the experts in those groups to submit requirements. This should be done *before*, not after, matters have been decided and the next-gen (or next-version) PG system is ready to be built. As I've said before, I believe it possible to come up with a set of basic requirements for all PG texts which will reasonably meet the needs for most, if not all, groups we identify (maybe by the "80-20" rule, at the minimum.) By designing the system to be extensible for particular special needs, then it will be able to fill in where the basic requirements don't. A summary rehash: If one considers that PG texts are not to be solely standalone (which is the traditional view), but rather are components of a dynamic and powerful repository (where the whole is greater than the sum of the parts), then this creates specific requirements which simultaneously impacts upon the areas of format, metadata/identifiers, database structure, user interface design, to name a few. A holistic approach is definitely necessary to assure that whatever is decided for one area will not cause problems in another area. Thinking holistically, factoring in the long-term vision of what we want the PG Library to do and to be fifty years from now (and I don't believe this is being discussed enough), is important. Jon Noring