DPLA has launched their site. It has a lot going on. Perhaps
their API could be a mechanism we could use at Project Gutenberg
as another means of accessing PG content.
The API:
http://dp.la/info/developers/codex/
-- Greg
Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation www.gutenberg.org
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby(a)pglaf.org
Hello,
With regards to the RDF/XML format of the Project Gutenberg catalog -
I noticed recently the catalog was updated to the new DCMI RDF
recommendations. Project Gutenberg still releases the legacy RDF as
well, for now.
One problem I have been having is the legacy RDF contains information
which the new RDF format does not.
The new and legacy book catalogs both contains a lot of information.
Two of the key things for me (and others) are obviously the title and
author of the book. With the old legacy catalog you could always
obtain the title and author of a book. This is not the case with all
the books in the new catalog, especially for authors.
One example is e-book #32838, "The Canadian Curler's Manual".
The corresponding RDF file, pg32838.rdf ( which is at
http://www.gutenberg.org/cache/epub/32838/pg32838.rdf and is also in
the compressed archive available on the website ), contains no
information about authorship, editorship, contributions and so
forth. With the new RDF format, I can usually find authorship
information within the pgterms:agent element of the RDF property.
E-book 32838 has no such element however.
Looking at the legacy RDF format for ebook #32838, I can find the
author. The author's name is in the dc:creator element. It notes the
author of the book is "Bicket, James". This information in the legacy
RDF format is absent from the new format. Neither the words James nor
Bicket appear in pg32838.rdf. With only the new RDF as a resource, I
have no idea who the author is.
Another example is e-book #10980, "Lady John Russell: A Memoir with
Selections from Her Diaries and Correspondence". The corresponding
RDF file, pg10980.rdf, contains no information about authorship,
editorship, contributions and so forth. Looking at the legacy RDF
format for ebook 10980, I find information indicating who I can note
as an author with it. Most authors in the legacy format have their
information in the dc:creator element. In the case of this book, the
authorship information is in the dc:contributor tag. It notes
"Russell, Agatha, lady, 1853- [Editor]" as an editor, and also lists
another editor, as well as two more contributors. This information in
the legacy RDF format is absent from the new format. With only the
new RDF as a resource, my author field appears blank, as the new RDF
contains none of the four names listed in the legacy RDF.
Another example is e-book #10668, "The War and Democracy". The legacy
RDF lists four authors as creators. The new RDF, pg10668.rdf, has no
author information.
I have other e-books as examples if you want them.
Hopefully you will continue providing the legacy RDF until the RDFs
following the current DCMI reccommendations have this information.
Thanks,
Dennis Sheil
P.S. Having other information that the legacy RDF had, like number of
downloads, would be nice as well. But listing who the author is is
probably more important.