[gutvol-p] Re: Gutenberg Catalogue RDF

26 May 2010

      William Waites wrote:
...
Hi all,
I'm working on a project, http://bibliographica.org/ which involves
annotating and enriching information about authors and works. We
need some seed data! I would very much like to use the gutenberg
catalogue for this and it seems like I have three options:
* generate RDF out of the marc dump (lossy, messy)
  * use the catalog.rdf.bz2 (old rdf layout)
  * use the individual RDF e.g. http://www.gutenberg.org/ebooks/12345.rdf
     (means crawling the site).
I'd really rather not crawl the site (and suspect you'd rather I not as
well) but I would like to use the RDF generated for individual works
(well, manifestations, but I digress).
Any chance of producing a dump like catalog.rdf.bz2 but with the
updated schema?
catalog.rdf.bz2 gets updated every night.

The individual rdf files don't get updated after first creation. That's 
something we will do eventually but need to figure out an efficient way.

Peacing the individual files together is not as easy as it seems because 
  we have to remove redundant information. We'll have to copy the entire 
database into a triple store and serialize it out again. Not likely to 
happen soon.

-- 
Marcello Perathoner
webmaster@gutenberg.org