Re: [gutvol-d] Perfection

13 Nov 2004

      Michele "Her Serene Highness" wrote:
...
[snip of excellent comments]
**Why not?  It's done all the time. Students and scholars have cited rare
books that are impossible to find before- I remember citing a rare book that
contained the concordat between the Vatican and Germany for a grad class
years ago, and information on the Black Star line of Marcus Garvey while
still in high school.  Why did my professors accept  my citations? Because
they could be tracked down.  It wasn't impossible to find the originals-
just difficult. The former one was located in Bobst Library at NYU and the
latter was in the NY Public Library's Schomberg Collection. I can find both
of them more easily now, because both libraries have their catalogues
online.  That menas I can find the cites and then go look at the actual
books.  Since there is no physical book with PG that an outsider can hold,
it would be nice to have a master scan of the text. PG isn't meant to be a
master text- it's a repository for copies.  But copies come from somewhere.
The above comment suggests two basic requirements PG should embrace
for all texts:

1) The original source (or sources for composite works) is fully
   identified and described in the metadata using accepted library
   cataloging standards, and that these fields are searchable.

2) The original page scans also exist in the database, linked to and
   from the digital text version (easy to do in XML -- TEI has markup
   for this purpose.)
...
I happen to love PG- but it will be in ideal form when it has hyperlinks to
other books and to the notes I type up, when I can print it out and have it
paginated, when I can tell if I'm reading a facsimile of a first edition.  I
know that's a lot, but a girl can dream. amd some sites are doing that kind
of thing with individual books already- but their scope isn't as large as
PG's.  PG's scope is what makes it valuable, but I wouldn't use it foor
scholarly work.
The ability to annotate, reference and interlink texts within a
digital text repository are very powerful features. The fundamental
architecture of the "PG Library System" should include this as a
future possibility. To me, this is even more exciting than some of
the other things being considered, such as language translation.

The requirements associated with these features strongly point to
formatting all PG master texts in XML. W3C's XPointer can be used to
address both spots and ranges within an XML document using several
schemes (both W3C defined and custom schemes within the XPointer
Framework.) The most common and most robust/persistent scheme is the
well-known fragment identifier. But there's also a scheme to point to
a particular element (tag) in a document which does not have an 'id',
as well as to point to a spot within content (this scheme is still in
Draft form -- it is not a W3C Recommend.)

So long as the XML document remains unchanged (and for the fragment
identifier scheme where the 'id's are kept unchanged even if changes
are made to the document), the XPointer addresses will still work.
(The term used here is "persistence".)

One problem area, which gets into Identifiers, is how to address the
XML document itself -- can it be addressed "standalone", or must it be
addressed only when it resides within a repository (such as the PG
Library)? If the XML document can be addressed standalone, apart from
the repository, then obviously it must internally contain an
identifier, the same one used to identify it within the repository
and which forms part of the URI reference.

It was an interesting exercise last year when the Open eBook Forum's
Publication Structure Working Group spent three months studying how
to reference and interlink OEBPS Publications, and how to address
particular spots and ranges within particular XML documents within a
Publication (OEBPS allows multiple documents to comprise one
Publication.) Of course, complicating things, which may be less of an
issue for PG, is that we wanted the linkability to persist even when
the OEBPS Publication is converted to something else, provided the
converted format can contain the relevant internal pointers. In this
study, Identifiers became a Significant Issue (tm). PG will need to
come up with a viable identifier system and specialized URI syntax for
using XLink.

For many of you, the above is probably all Greek. But if one wants
to enable annotation, referencing, and text interlinking within the
PG Library system, then this will put constraints and requirements
that need to be considered. One workable solution is where all the
texts are in XML, and one uses these cool technologies called
XPointer and XLink to enable these features. Fortunately, it appears
the "powers who are" have decided upon moving someday to XML for the
PG Master Texts.
...
One person made the comment that PG shouldn't try to anticipate what
scholars want- it should let scholars discover it and let them say what they
need.  I just did, and most of what I'm hearing is that I have to learn to
adapt to PG, when there are perfectly good college libraries out there.
There is no reason for scholars to embrace a site that doesn't even meet up
with basic MLA guidelines for books.  After all, that's the business you are
in- not original websites, but books.
Michele's point is that before PG makes any substantive decisions, it
needs to decide upon which user groups it would like its texts to
target (the more the better in my opinion), and then ask the experts
in those groups to submit requirements. This should be done *before*,
not after, matters have been decided and the next-gen (or next-version)
PG system is ready to be built.

As I've said before, I believe it possible to come up with a set of
basic requirements for all PG texts which will reasonably meet the
needs for most, if not all, groups we identify (maybe by the "80-20"
rule, at the minimum.) By designing the system to be extensible for
particular special needs, then it will be able to fill in where the
basic requirements don't.

A summary rehash: If one considers that PG texts are not to be solely
standalone (which is the traditional view), but rather are components
of a dynamic and powerful repository (where the whole is greater than
the sum of the parts), then this creates specific requirements which
simultaneously impacts upon the areas of format, metadata/identifiers,
database structure, user interface design, to name a few. A holistic
approach is definitely necessary to assure that whatever is decided
for one area will not cause problems in another area. Thinking
holistically, factoring in the long-term vision of what we want the
PG Library to do and to be fifty years from now (and I don't believe
this is being discussed enough), is important.

Jon Noring