Thing is, indexing takes a long time and occupies quite a lot of space. I managed to reduce it by filtering the "important" parts of the rdf to index. If the location of the text can be inferred from #etext1802 i prefer to use that. On Thu, Jul 16, 2009 at 8:05 AM, Marcello Perathoner <[1]marcello@perathoner.de> wrote: Paulo Levi wrote: I made a ebook reader (here) [2]http://code.google.com/p/bookjar/downloads/list and i'd like to search and download Gutenberg books. I already have a searcher prototype using LuceneSail a library that uses Lucene to index rdf documents and only indexing what i want from the catalog.rdf.zip. Now i'd like to know how from the url inside the catalog i can fetch the book itself, and what are the variants for the formats. A example query result: author: Shakespeare, William, 1564-1616 url: [3]http://www.gutenberg.org/feeds/catalog.rdf#etext1802 title: King Henry VIII So, i like to know how from the etext1802 number can i get a working url to download the book, and how to construct variants for each format. In the second half of the rdf file you will find records for all the files in different formats we offer for an ebook. Use the #etext1802 as link between book record and file records. References 1. mailto:marcello@perathoner.de 2. http://code.google.com/p/bookjar/downloads/list 3. http://www.gutenberg.org/feeds/catalog.rdf#etext1802