[gutvol-p] Re: Gutenberg Catalogue RDF

William Waites william.waites at okfn.org
Wed May 26 12:08:23 PDT 2010


On 10-05-26 19:24, Marcello Perathoner wrote:
> Not at all. The information is quite the same. (There's also author
> birth and death dates in the individual files, but otherwise its the
> same.)

Not quite. The differences I can see for the CIA Factbook entry
(examples I sent
earlier, happens to be the first entry in the catalog.rdf.bz2 that I
have lying around)

 * Different subject URI <-- very important, could kludge with
owl:sameAs but
    shouldn't have to
 * Different layout for dc:subject (uses a rdf:Bag in one, a simple bunch of
    bnodes in the other)
 * Creator/Contributor/Publisher has a URI in the individual files but a
text string
    in the catalog.rdf.gz. Using a URI is the right way to do it.
 * Links to downloadable resources are absent in the catalog

So the first means that it is ambiguous which thing I am referring to if
I use your
URIs without going to the trouble of putting in owl:sameAs and then
inferencing
on that (resource intensive and messy).

The second means that when I create a lens (c.f. fresnel vocabulary) for
looking
at the data I can't do it in a consistent way because sometimes
dc:subject has
one shape and sometimes another.

The third means that if I want to present all works by an author I have
to resort
to smooshing on a text string when you already have URIs minted for that
purpose.

The fourth means that I can't provide links to the actual text, or
download it
automatically for indexing/text-mining purposes if I use catalog.rdf.bz2

The information is *similar* but not the same.

-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK



More information about the gutvol-p mailing list