Marcello, On 10-05-26 21:36, Marcello Perathoner wrote:
William Waites wrote:
* Different subject URI <-- very important, could kludge with owl:sameAs but shouldn't have to
The DCMI changed their recommendations. The RDF files follow what recommendations where current at the time I wrote the scripts.
The syntax may be different but the semantic is the same.
Different subject URIs mean different subjects. What is the canonical URI that is to be used to refer to one of gutenberg's texts? (I don't think it's the one in catalog.rdf)
* Different layout for dc:subject (uses a rdf:Bag in one, a simple bunch of bnodes in the other)
A Bag *is* just a bunch of nodes.
The syntax may be different but the semantic is the same.
This: dc:subject [ dcam:memberOf dc:LCSH; rdf:value "Geography -- Handbooks, manuals, etc.", "Political science -- Handbooks, manuals, etc.", "Political statistics -- Handbooks, manuals, etc.", "World politics -- Handbooks, manuals, etc."], [ dcam:memberOf dc:LCC; rdf:value "G"]; is different from this: dc:subject [ a rdf:Bag; rdf:_1 [ a dc:LCSH; rdf:value "Geography -- Handbooks, manuals, etc."]; rdf:_2 [ a dc:LCSH; rdf:value "World politics -- Handbooks, manuals, etc."]; rdf:_3 [ a dc:LCSH; rdf:value "Political science -- Handbooks, manuals, etc."]; rdf:_4 [ a dc:LCSH; rdf:value "Political statistics -- Handbooks, manuals, etc."]], [ a dc:LCC; rdf:value "G"]; Try writing a script that yields the strings in there and you'll see that it is different.
* Creator/Contributor/Publisher has a URI in the individual files but a text string in the catalog.rdf.gz. Using a URI is the right way to do it.
Thats the only difference.
Actually using an URL is quite the wrong way. I did that only to make it possible for somebody to create an exact replica of our dataset (ie. containing the exact same set of (wrong?) assumptions we made.)
The semantic of the string literal is: the author of this book is spelled 'John Doe'.
The semantic of the URL is: the author of this book is spelled 'John Doe' *and* the authors of two books are the same person if they share the same url.
Now the second statement is a very bold statement, especially if you don't find any LoC record for the book you are cataloguing or the LoC doesn't know either. (This happens quite often.)
Since we want to be able to make statements about Authors, they need to have URIs. I agree it's a bold statement, and inferring this information is error-prone. However I could start from scratch or I could start from the work you've already done. I'd rather not have to start from scratch. So the information needed to make an exact replica of your dataset is contained in the individual works rdf, not the catalog.rdf. How does that work?
* Links to downloadable resources are absent in the catalog
Look further down.
ok. Cheers, -w -- William Waites <william.waites@okfn.org> Mob: +44 789 798 9965 Open Knowledge Foundation Fax: +44 131 464 4948 Edinburgh, UK