William Waites wrote:
Hi all,
I'm working on a project, http://bibliographica.org/ which involves annotating and enriching information about authors and works. We need some seed data! I would very much like to use the gutenberg catalogue for this and it seems like I have three options:
* generate RDF out of the marc dump (lossy, messy) * use the catalog.rdf.bz2 (old rdf layout) * use the individual RDF e.g. http://www.gutenberg.org/ebooks/12345.rdf (means crawling the site).
I'd really rather not crawl the site (and suspect you'd rather I not as well) but I would like to use the RDF generated for individual works (well, manifestations, but I digress).
Any chance of producing a dump like catalog.rdf.bz2 but with the updated schema?
catalog.rdf.bz2 gets updated every night. The individual rdf files don't get updated after first creation. That's something we will do eventually but need to figure out an efficient way. Peacing the individual files together is not as easy as it seems because we have to remove redundant information. We'll have to copy the entire database into a triple store and serialize it out again. Not likely to happen soon. -- Marcello Perathoner webmaster@gutenberg.org