Re: [gutvol-d] RST/PGTEI/etc

13 Feb 2012

      On Sat, Feb 11, 2012 at 10:18:21AM -0800, don kretz wrote:
...
On Sat, Feb 11, 2012 at 8:01 AM, Carlo Traverso <traverso@posso.dm.unipi.it>wrote:
...
...
...
...
...
> "Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
All the ww procedures toss away data. To get a clearance one has to
enter the publisher name and location and publishing date; these are
present in the clearance record, and can be consulted. The full
clearance itself contains the publication date, and is part of the
upload note, but the publisher is gone (unless the preparer
transcribes the front page, that now is usual). And the posted note
discards this information.
And it's remarkable someone would try to reconstruct the data
and yet be so indifferent about having the page images to refer to,.
or accessible from the text.
The issue of whether, or how, to include information about
sources used has been contentious as long as there has been
a Project Gutenberg.

The policy forever (at least since the first version of the
"small print", in the early 1990s or late 1980s) is found in
every single eBook and elsewhere:

"Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S.
unless a copyright notice is included.  Thus, we do not necessarily
keep eBooks in compliance with any particular paper edition."

It has long been recognized that some eBook producers prefer to have
their work adhere to a particular print edition.  This is perfectly
fine.  

It is certainly acceptable to include any quantity of information
about source(s) used in a given eBook.  Witness the practice of
including scans and transcriptions of the TP&V and other in-book
metadata, as part of an eBook submission.

For producers to include such information in a more structured
format seems fine to me.  I don't recall anyone ever presenting an
eBook in such a format (say, with a snipped of Dublin Core XML
at the end).  

Keep in mind that it is very much our policy and intent to NOT
maintain any particular adherance of an eBook to a print item.  For
example, if the print edition had an error that was fixed in later
editions, we certainly would apply that correction if it were
submitted to the errata process.  (Ok, I can think of one or two
exceptions, such as our Shakespeare first folios.)

All that said: the idea that PG could catalog our items, and derive
their *primary* metadata as based on one or more print editions used
as sources is just not consistent with the policy and practice cited
above.  Our #140 was *not* published in 1906, it was published in 1994.
(Hmmm...interesting example, since the catalog doesn't have this
right, either.)

We'd get beat up about it.  Librarians would complain.  Publishers
would have a basis to complain about us mis-using their trademarks.
And, it would be false.  The PG editions are *not* their print sources.

The idea of structured metadata about sources makes sense.  Only if
it's clearly a search for source material(s) used, not for the PG
titles.  And, carrying such information from the copyright clearance
through the eBook submission is something that current producers could
do today, and often do (though not in a way that is structured to be
easily machine-parsable).

In short, I see some technical problems and solutions to making
source metadata easier to (a) keep with an eBook, and (b) search for.
I don't see any policy in the way.

The issue of whether a PG eBook must adhere to a particular print
edition, or is the same as a print edition, was settled decades ago.
Those with scholarly interests or other special purposes that require
study of a particular print edition are invited, and have always been
invited, to find other resources to supplement or replace those of
Project Gutenberg.

  -- Greg

Re: [gutvol-d] RST/PGTEI/etc

Greg Newby