Hi, thanks for making the Project Gutenberg catalogue available in RDF. This creates many possibliities for browsing and indexing ebooks, for both personal and research uses. Unfortunately it seems the catalog.rdf file is missing some lines, and as a result cannot be parsed by strict parsers such as those in libxml2 (which is very widely used by many platforms). After some brief googling I came across Grahame Bowland's site, which includes a simple unix shell script which he developed recently: http://angrygoats.net/svn/gutenberg/fix-catalog.sh This inserts the missing entities into the DOCTYPE declaration at the top of catalog.rdf. Of course it would be better if these entities could be included in the original catalog.rdf published by Project Gutenberg :) So, would it be possible for the system that's generating catalog.rdf to include these entities? thanks, Conrad.
Conrad Parker wrote:
Unfortunately it seems the catalog.rdf file is missing some lines, and as a result cannot be parsed by strict parsers such as those in libxml2 (which is very widely used by many platforms).
I just parsed it successfully using perl 5.8.0 and libxml 2.5.10.
After some brief googling I came across Grahame Bowland's site, which includes a simple unix shell script which he developed recently:
http://angrygoats.net/svn/gutenberg/fix-catalog.sh
This inserts the missing entities into the DOCTYPE declaration at the top of catalog.rdf. Of course it would be better if these entities could be included in the original catalog.rdf published by Project Gutenberg :)
We do not use HTML entities in the database any more, so the generated RDF/XML and RSS should not contain any. -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Conrad Parker
-
Marcello Perathoner