[gutvol-p] Re: Gutenberg Catalogue RDF

Marcello Perathoner marcello at perathoner.de
Wed May 26 08:22:02 PDT 2010


William Waites wrote:
> Hi all,
> 
> I'm working on a project, http://bibliographica.org/ which involves
> annotating and enriching information about authors and works. We
> need some seed data! I would very much like to use the gutenberg
> catalogue for this and it seems like I have three options:
> 
>   * generate RDF out of the marc dump (lossy, messy)
>   * use the catalog.rdf.bz2 (old rdf layout)
>   * use the individual RDF e.g. http://www.gutenberg.org/ebooks/12345.rdf
>      (means crawling the site).
> 
> I'd really rather not crawl the site (and suspect you'd rather I not as
> well) but I would like to use the RDF generated for individual works
> (well, manifestations, but I digress).
> 
> Any chance of producing a dump like catalog.rdf.bz2 but with the
> updated schema?

catalog.rdf.bz2 gets updated every night.

The individual rdf files don't get updated after first creation. That's 
something we will do eventually but need to figure out an efficient way.

Peacing the individual files together is not as easy as it seems because 
  we have to remove redundant information. We'll have to copy the entire 
database into a triple store and serialize it out again. Not likely to 
happen soon.



-- 
Marcello Perathoner
webmaster at gutenberg.org



More information about the gutvol-p mailing list