From conrad at metadecks.org Tue Mar 22 19:36:31 2005 From: conrad at metadecks.org (Conrad Parker) Date: Wed, 23 Mar 2005 14:36:31 +1100 Subject: [gutvol-p] catalog.rdf invalid xml Message-ID: <20050323033631.GC31379@vergenet.net> Hi, thanks for making the Project Gutenberg catalogue available in RDF. This creates many possibliities for browsing and indexing ebooks, for both personal and research uses. Unfortunately it seems the catalog.rdf file is missing some lines, and as a result cannot be parsed by strict parsers such as those in libxml2 (which is very widely used by many platforms). After some brief googling I came across Grahame Bowland's site, which includes a simple unix shell script which he developed recently: http://angrygoats.net/svn/gutenberg/fix-catalog.sh This inserts the missing entities into the DOCTYPE declaration at the top of catalog.rdf. Of course it would be better if these entities could be included in the original catalog.rdf published by Project Gutenberg :) So, would it be possible for the system that's generating catalog.rdf to include these entities? thanks, Conrad. From webmaster at gutenberg.org Wed Mar 23 14:34:09 2005 From: webmaster at gutenberg.org (Marcello Perathoner) Date: Wed, 23 Mar 2005 23:34:09 +0100 Subject: [gutvol-p] catalog.rdf invalid xml In-Reply-To: <20050323033631.GC31379@vergenet.net> References: <20050323033631.GC31379@vergenet.net> Message-ID: <4241EEE1.8080806@gutenberg.org> Conrad Parker wrote: > Unfortunately it seems the catalog.rdf file is missing some lines, and > as a result cannot be parsed by strict parsers such as those in libxml2 > (which is very widely used by many platforms). I just parsed it successfully using perl 5.8.0 and libxml 2.5.10. > After some brief googling I came across Grahame Bowland's site, which > includes a simple unix shell script which he developed recently: > > http://angrygoats.net/svn/gutenberg/fix-catalog.sh > > This inserts the missing entities into the DOCTYPE declaration at the > top of catalog.rdf. Of course it would be better if these entities could > be included in the original catalog.rdf published by Project Gutenberg :) We do not use HTML entities in the database any more, so the generated RDF/XML and RSS should not contain any. -- Marcello Perathoner webmaster at gutenberg.org