Thing is, indexing takes a long time and occupies quite a lot of space.
   I managed to reduce it by filtering the "important" parts of the rdf to
   index. If the location of the text can be inferred from #etext1802 i
   prefer to use that.

   On Thu, Jul 16, 2009 at 8:05 AM, Marcello Perathoner
   <[1]marcello@perathoner.de> wrote:

   Paulo Levi wrote:

     I made a ebook reader
     (here) [2]http://code.google.com/p/bookjar/downloads/list
     and i'd like to search and download Gutenberg books. I already have
     a searcher prototype using LuceneSail a library that uses Lucene to
     index rdf documents and only indexing what i want from the
     catalog.rdf.zip.
     Now i'd like to know how from the url inside the catalog i can fetch
     the book itself, and what are the variants for the formats.
     A example query result:
     author: Shakespeare, William, 1564-1616
     url: [3]http://www.gutenberg.org/feeds/catalog.rdf#etext1802
     title: King Henry VIII
     So, i like to know how from the etext1802 number can i get a working
     url to download the book, and how to construct variants for each
     format.

     In the second half of the rdf file you will find records for all the
     files in different formats we offer for an ebook. Use the #etext1802
     as link between book record and file records.

References

   1. mailto:marcello@perathoner.de
   2. http://code.google.com/p/bookjar/downloads/list
   3. http://www.gutenberg.org/feeds/catalog.rdf#etext1802