[gutvol-p] Re: Gutenberg Catalogue RDF

William Waites william.waites at okfn.org
Thu May 27 07:23:10 PDT 2010


Marcello,

On 10-05-26 21:36, Marcello Perathoner wrote:
> William Waites wrote:
>
>>  * Different subject URI <-- very important, could kludge with
>> owl:sameAs but
>>     shouldn't have to
>
> The DCMI changed their recommendations. The RDF files follow what
> recommendations where current at the time I wrote the scripts.
>
> The syntax may be different but the semantic is the same.

Different subject URIs mean different subjects. What is the canonical
URI that
is to be used to refer to one of gutenberg's texts? (I don't think it's
the one in
catalog.rdf)

>>  * Different layout for dc:subject (uses a rdf:Bag in one, a simple
>> bunch of
>>     bnodes in the other)
>
> A Bag *is* just a bunch of nodes.
>
> The syntax may be different but the semantic is the same.

This:

     dc:subject [ dcam:memberOf dc:LCSH;
             rdf:value "Geography -- Handbooks, manuals, etc.",
                 "Political science -- Handbooks, manuals, etc.",
                 "Political statistics -- Handbooks, manuals, etc.",
                 "World politics -- Handbooks, manuals, etc."],
         [ dcam:memberOf dc:LCC;
             rdf:value "G"];

is different from this:

     dc:subject [ a rdf:Bag;
             rdf:_1 [ a dc:LCSH;
                     rdf:value "Geography -- Handbooks, manuals, etc."];
             rdf:_2 [ a dc:LCSH;
                     rdf:value "World politics -- Handbooks, manuals,
etc."];
             rdf:_3 [ a dc:LCSH;
                     rdf:value "Political science -- Handbooks, manuals,
etc."];
             rdf:_4 [ a dc:LCSH;
                     rdf:value "Political statistics -- Handbooks,
manuals, etc."]],
         [ a dc:LCC;
             rdf:value "G"];

Try writing a script that yields the strings in there and you'll see
that it is different.

>>  * Creator/Contributor/Publisher has a URI in the individual files but a
>> text string
>>     in the catalog.rdf.gz. Using a URI is the right way to do it.
>
> Thats the only difference.
>
> Actually using an URL is quite the wrong way. I did that only to make
> it possible for somebody to create an exact replica of our dataset
> (ie. containing the exact same set of (wrong?) assumptions we made.)
>
> The semantic of the string literal is: the author of this book is
> spelled 'John Doe'.
>
> The semantic of the URL is: the author of this book is spelled 'John
> Doe' *and* the authors of two books are the same person if they share
> the same url.
>
> Now the second statement is a very bold statement, especially if you
> don't find any LoC record for the book you are cataloguing or the LoC
> doesn't know either. (This happens quite often.)

Since we want to be able to make statements about Authors, they need to
have URIs.

I agree it's a bold statement, and inferring this information is
error-prone. However I
could start from scratch or I could start from the work you've already
done. I'd rather
not have to start from scratch.

So the information needed to make an exact replica of your dataset is
contained in
the individual works rdf, not the catalog.rdf. How does that work?

>   * Links to downloadable resources are absent in the catalog
>
> Look further down.

ok.

Cheers,
-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK




More information about the gutvol-p mailing list