[gutvol-p] Re: Quick question about file formats

Marcello Perathoner marcello at perathoner.de
Sat Oct 30 15:39:28 PDT 2010


William Waites wrote:
> On Sat, Oct 30, 2010 at 08:56:50PM +0200, Marcello Perathoner wrote:
>> Paulo Levi wrote:
>>> Another quick question :)
>>> Are the rules for creating a download url from the "file" tag in the rdf 
>>> catalog consistent?
>> The "algorithm" is the expansion of XML entities, which any common 
>> run-of-the-mill xml parser will do for you.
> 
> RDF != XML
> 
>> I think we had this discussion already. This is an XML file and should 
>> be processed thru an XML parser. If you don't, every little cosmetic 
>> change to the file structure will break your program. You have been warned.
> 
> If you're trying to interpret RDF data, it's better
> to use a library, they exist for just about all
> programming languages. If you try to interpret it
> as XML you are asking for trouble.
> 
> It is too bad that the RDF you get from here,
> http://www.gutenberg.org/ebooks/12345.rdf
> is different from the catalogue. 

This is intended and documented.

   http://www.gutenberg.org/wiki/Gutenberg:Feeds

The old catalog.rdf is a legacy format we keep for compatibiity.

> 
> This is because you have
> 
> 	xml:base="http://www.gutenberg.org/feeds/catalog.rdf
> 
> and then, e.g. 
> 
> 	rdf:ID="etext12345"
> 
> This amounts to giving the URI
> 
> 	http://www.gutenberg.org/feeds/catalog.rdfetext12345
> 
> to that book which is not what you intend.

Wrong. This gives

   http://www.gutenberg.org/feeds/catalog.rdf#etext12345

"The rdf:ID attribute on a node element (not property element, that has 
another meaning) can be used instead of rdf:about and gives a relative 
RDF URI reference equivalent to # concatenated with the rdf:ID attribute 
value."

> 
> If on the other hand you had used
> 
> 	rdf:about="http://www.gutenberg.org/ebooks/12345"
> 
> the data would be the same (which I guess is 
> what you intend).
> 
> where lower down you talk about formats, 
> you use
> 
> 	rdf:resource="#etext12345"
> 
> which refers to
> 
> 	http://www.gutenberg.org/feeds/catalog.rdf#etext12345
> 
> which if it weren't for the error with rdf:ID would
> at least be consistent within the catalogue.
> 
> But supposing this is fixed, I still have two
> URIs for one text:
> 
> 	http://www.gutenberg.org/feeds/catalog.rdf#etext12345
> 	http://www.gutenberg.org/ebooks/12345
> 
> and you've given no way of knowing that they are
> in fact the same.

Because you are not supposed to mix the old catalog.rdf with the new 
catalog.rdf which will be put online when I get to finish it.


-- 
Marcello Perathoner
webmaster at gutenberg.org



More information about the gutvol-p mailing list