Problems with the new RDF/XML format

Hello, With regards to the RDF/XML format of the Project Gutenberg catalog - I noticed recently the catalog was updated to the new DCMI RDF recommendations. Project Gutenberg still releases the legacy RDF as well, for now. One problem I have been having is the legacy RDF contains information which the new RDF format does not. The new and legacy book catalogs both contains a lot of information. Two of the key things for me (and others) are obviously the title and author of the book. With the old legacy catalog you could always obtain the title and author of a book. This is not the case with all the books in the new catalog, especially for authors. One example is e-book #32838, "The Canadian Curler's Manual". The corresponding RDF file, pg32838.rdf ( which is at http://www.gutenberg.org/cache/epub/32838/pg32838.rdf and is also in the compressed archive available on the website ), contains no information about authorship, editorship, contributions and so forth. With the new RDF format, I can usually find authorship information within the pgterms:agent element of the RDF property. E-book 32838 has no such element however. Looking at the legacy RDF format for ebook #32838, I can find the author. The author's name is in the dc:creator element. It notes the author of the book is "Bicket, James". This information in the legacy RDF format is absent from the new format. Neither the words James nor Bicket appear in pg32838.rdf. With only the new RDF as a resource, I have no idea who the author is. Another example is e-book #10980, "Lady John Russell: A Memoir with Selections from Her Diaries and Correspondence". The corresponding RDF file, pg10980.rdf, contains no information about authorship, editorship, contributions and so forth. Looking at the legacy RDF format for ebook 10980, I find information indicating who I can note as an author with it. Most authors in the legacy format have their information in the dc:creator element. In the case of this book, the authorship information is in the dc:contributor tag. It notes "Russell, Agatha, lady, 1853- [Editor]" as an editor, and also lists another editor, as well as two more contributors. This information in the legacy RDF format is absent from the new format. With only the new RDF as a resource, my author field appears blank, as the new RDF contains none of the four names listed in the legacy RDF. Another example is e-book #10668, "The War and Democracy". The legacy RDF lists four authors as creators. The new RDF, pg10668.rdf, has no author information. I have other e-books as examples if you want them. Hopefully you will continue providing the legacy RDF until the RDFs following the current DCMI reccommendations have this information. Thanks, Dennis Sheil P.S. Having other information that the legacy RDF had, like number of downloads, would be nice as well. But listing who the author is is probably more important.

I can edit the catalog records, but I did not design the underlying structures. I have never used the RDF directly myself, and did not know that there was a "new RDF format". Marcello would probably be the best person to help address your question... Thanks, Andrew On Tue, 9 Apr 2013, Dennis Sheil wrote:
Hello,
With regards to the RDF/XML format of the Project Gutenberg catalog -
I noticed recently the catalog was updated to the new DCMI RDF recommendations. Project Gutenberg still releases the legacy RDF as well, for now.
One problem I have been having is the legacy RDF contains information which the new RDF format does not.
The new and legacy book catalogs both contains a lot of information. Two of the key things for me (and others) are obviously the title and author of the book. With the old legacy catalog you could always obtain the title and author of a book. This is not the case with all the books in the new catalog, especially for authors.
One example is e-book #32838, "The Canadian Curler's Manual".
The corresponding RDF file, pg32838.rdf ( which is at http://www.gutenberg.org/cache/epub/32838/pg32838.rdf and is also in the compressed archive available on the website ), contains no information about authorship, editorship, contributions and so forth. With the new RDF format, I can usually find authorship information within the pgterms:agent element of the RDF property. E-book 32838 has no such element however.
Looking at the legacy RDF format for ebook #32838, I can find the author. The author's name is in the dc:creator element. It notes the author of the book is "Bicket, James". This information in the legacy RDF format is absent from the new format. Neither the words James nor Bicket appear in pg32838.rdf. With only the new RDF as a resource, I have no idea who the author is.
Another example is e-book #10980, "Lady John Russell: A Memoir with Selections from Her Diaries and Correspondence". The corresponding RDF file, pg10980.rdf, contains no information about authorship, editorship, contributions and so forth. Looking at the legacy RDF format for ebook 10980, I find information indicating who I can note as an author with it. Most authors in the legacy format have their information in the dc:creator element. In the case of this book, the authorship information is in the dc:contributor tag. It notes "Russell, Agatha, lady, 1853- [Editor]" as an editor, and also lists another editor, as well as two more contributors. This information in the legacy RDF format is absent from the new format. With only the new RDF as a resource, my author field appears blank, as the new RDF contains none of the four names listed in the legacy RDF.
Another example is e-book #10668, "The War and Democracy". The legacy RDF lists four authors as creators. The new RDF, pg10668.rdf, has no author information.
I have other e-books as examples if you want them.
Hopefully you will continue providing the legacy RDF until the RDFs following the current DCMI reccommendations have this information.
Thanks, Dennis Sheil
P.S. Having other information that the legacy RDF had, like number of downloads, would be nice as well. But listing who the author is is probably more important. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

This web page is mostly what I'm talking about - http://www.gutenberg.org/wiki/Gutenberg:Feeds In January it had only the legacy format (one big RDF in the the now superseded 2002 Dublin Core RDF standard), now it also has the new format (many RDFs in the current Dublin Core RDF standard). An archive is compiled in the new and legacy format every day, and then compressed by two methods resulting in four new files a day. The files I mentioned are ones I noticed, but I can give other examples as well. I guess we will bring Marcello into the conversation and discuss how to proceed. Things are working for me right now, as long as the legacy format that contains all of the information is still generated. You probably want to migrate the data the old RDF contains to the new format before phasing the old format out. There is no rush for time on my end as long as the legacy RDF is still generated. Everything is still working for me. I thought that I should just note the problem if people were unaware of it. The shift to the new format just happened in the past weeks, so small complications like this are bound to arise when things like that are done. Thanks, Dennis On Wed, Apr 10, 2013 at 10:00 PM, Andrew Sly <sly@victoria.tc.ca> wrote:
I can edit the catalog records, but I did not design the underlying structures.
I have never used the RDF directly myself, and did not know that there was a "new RDF format".
Marcello would probably be the best person to help address your question...
Thanks, Andrew
participants (2)
-
Andrew Sly
-
Dennis Sheil