[gutvol-p] Re: Gutenberg Catalogue RDF
william.waites at okfn.org
Wed May 26 09:14:47 PDT 2010
On 10-05-26 16:22, Marcello Perathoner wrote:
> catalog.rdf.bz2 gets updated every night.
Yes it does, but the information in there is very different from the
I attach records from the catalogue and from the individual file.
The biggest problem is,
are completely different. And there's no way (other than hand coding
to make it from one to the other). So even if we were to initially
catalog.rdf.bz2 we couldn't then go and pull the detailed records.
(Btw, content-negotiation doesn't seem to work:
curl -H "Accept: application/rdf+xml"
gives a 406)
> The individual rdf files don't get updated after first creation.
> That's something we will do eventually but need to figure out an
> efficient way.
We'd be happy to help with that. We've done a lot of thinking about this
a quite scalable way -- see http://bibliographica.org/docs/ordf/ in fact
it would be
pretty low overhead if you didn't use the reasoning and fancy indexing.
> Peacing the individual files together is not as easy as it seems
> because we have to remove redundant information. We'll have to copy
> the entire database into a triple store and serialize it out again.
> Not likely to happen soon.
That's not so hard. We could even do it for you given a tar of all the
files. In fact for our purposes this would be better because otherwise
to break the big file out into many small graphs (for each distinct
William Waites <william.waites at okfn.org>
Mob: +44 789 798 9965 Open Knowledge Foundation
Fax: +44 131 464 4948 Edinburgh, UK
More information about the gutvol-p