Fwd: Programmatic fetching books from Gutenberg

---------- Forwarded message ---------- From: Paulo Levi <i30817@gmail.com> Date: Thu, Jul 16, 2009 at 1:25 AM Subject: Programmatic fetching books from Gutenberg To: gutvol-p@lists.pglaf.org I made a ebook reader (here) http://code.google.com/p/bookjar/downloads/list and i'd like to search and download Gutenberg books. I already have a searcher prototype using LuceneSail a library that uses Lucene to index rdf documents and only indexing what i want from the catalog.rdf.zip. Now i'd like to know how from the url inside the catalog i can fetch the book itself, and what are the variants for the formats. A example query result: author: Shakespeare, William, 1564-1616 url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802 title: King Henry VIII So, i like to know how from the etext1802 number can i get a working url to download the book, and how to construct variants for each format. Thank you in advance.

Paulo Levi wrote:
---------- Forwarded message ---------- From: *Paulo Levi* <i30817@gmail.com <mailto:i30817@gmail.com>> Date: Thu, Jul 16, 2009 at 1:25 AM Subject: Programmatic fetching books from Gutenberg To: gutvol-p@lists.pglaf.org <mailto:gutvol-p@lists.pglaf.org>
I made a ebook reader (here) http://code.google.com/p/bookjar/downloads/list
and i'd like to search and download Gutenberg books. I already have a searcher prototype using LuceneSail a library that uses Lucene to index rdf documents and only indexing what i want from the catalog.rdf.zip.
Now i'd like to know how from the url inside the catalog i can fetch the book itself, and what are the variants for the formats. A example query result: author: Shakespeare, William, 1564-1616 url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802 title: King Henry VIII
So, i like to know how from the etext1802 number can i get a working url to download the book, and how to construct variants for each format.
Thank you in advance.
I already told you how to do that on gutvol-p. You make a very simple thing very complicated because you refuse to use xml tools to scan an xml file. This simple xpath query: xpath ("//pgterms:file[dcterms::isFormatOf[@rdf:resource='#etext29514']]") will get all files we have for book 29514 with mimetype, size and last modification date. --- excerpt from catalog.rdf --- <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.txt"> <dc:format><dcterms:IMT><rdf:value>text/plain; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>27727</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.zip"> <dc:format><dcterms:IMT><rdf:value>text/plain; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>10751</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h/29514-h.htm"> <dc:format><dcterms:IMT><rdf:value>text/html; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>29847</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h.zip"> <dc:format><dcterms:IMT><rdf:value>text/html; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>18787</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file>
participants (2)
-
Marcello Perathoner
-
Paulo Levi