
On Sat, Feb 11, 2012 at 10:18:21AM -0800, don kretz wrote:
On Sat, Feb 11, 2012 at 8:01 AM, Carlo Traverso <traverso@posso.dm.unipi.it>wrote:
> "Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
All the ww procedures toss away data. To get a clearance one has to enter the publisher name and location and publishing date; these are present in the clearance record, and can be consulted. The full clearance itself contains the publication date, and is part of the upload note, but the publisher is gone (unless the preparer transcribes the front page, that now is usual). And the posted note discards this information.
And it's remarkable someone would try to reconstruct the data and yet be so indifferent about having the page images to refer to,. or accessible from the text.
The issue of whether, or how, to include information about sources used has been contentious as long as there has been a Project Gutenberg. The policy forever (at least since the first version of the "small print", in the early 1990s or late 1980s) is found in every single eBook and elsewhere: "Project Gutenberg-tm eBooks are often created from several printed editions, all of which are confirmed as Public Domain in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition." It has long been recognized that some eBook producers prefer to have their work adhere to a particular print edition. This is perfectly fine. It is certainly acceptable to include any quantity of information about source(s) used in a given eBook. Witness the practice of including scans and transcriptions of the TP&V and other in-book metadata, as part of an eBook submission. For producers to include such information in a more structured format seems fine to me. I don't recall anyone ever presenting an eBook in such a format (say, with a snipped of Dublin Core XML at the end). Keep in mind that it is very much our policy and intent to NOT maintain any particular adherance of an eBook to a print item. For example, if the print edition had an error that was fixed in later editions, we certainly would apply that correction if it were submitted to the errata process. (Ok, I can think of one or two exceptions, such as our Shakespeare first folios.) All that said: the idea that PG could catalog our items, and derive their *primary* metadata as based on one or more print editions used as sources is just not consistent with the policy and practice cited above. Our #140 was *not* published in 1906, it was published in 1994. (Hmmm...interesting example, since the catalog doesn't have this right, either.) We'd get beat up about it. Librarians would complain. Publishers would have a basis to complain about us mis-using their trademarks. And, it would be false. The PG editions are *not* their print sources. The idea of structured metadata about sources makes sense. Only if it's clearly a search for source material(s) used, not for the PG titles. And, carrying such information from the copyright clearance through the eBook submission is something that current producers could do today, and often do (though not in a way that is structured to be easily machine-parsable). In short, I see some technical problems and solutions to making source metadata easier to (a) keep with an eBook, and (b) search for. I don't see any policy in the way. The issue of whether a PG eBook must adhere to a particular print edition, or is the same as a print edition, was settled decades ago. Those with scholarly interests or other special purposes that require study of a particular print edition are invited, and have always been invited, to find other resources to supplement or replace those of Project Gutenberg. -- Greg