[gutvol-p] Re: Programmatic fetching books from Gutenberg

Paulo Levi i30817 at gmail.com
Thu Jul 16 10:55:51 PDT 2009


Thing is, indexing takes a long time and occupies quite a lot of space. I
managed to reduce it by filtering the "important" parts of the rdf to index.
If the location of the text can be inferred from #etext1802 i prefer to use
that.

On Thu, Jul 16, 2009 at 8:05 AM, Marcello Perathoner <marcello at perathoner.de
> wrote:

> Paulo Levi wrote:
>
>> I made a ebook reader
>> (here) http://code.google.com/p/bookjar/downloads/list
>>
>> and i'd like to search and download Gutenberg books. I already have a
>> searcher prototype using LuceneSail a library that uses Lucene to index rdf
>> documents and only indexing what i want from the catalog.rdf.zip.
>>
>> Now i'd like to know how from the url inside the catalog i can fetch the
>> book itself, and what are the variants for the formats.
>> A example query result:
>>  author: Shakespeare, William, 1564-1616
>>  url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802
>>  title: King Henry VIII
>>  So, i like to know how from the etext1802 number can i get a working url
>> to download the book, and how to construct variants for each format.
>>
>
> In the second half of the rdf file you will find records for all the files
> in different formats we offer for an ebook. Use the #etext1802 as link
> between book record and file records.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1925 bytes
Desc: not available
URL: <https://lists.pglaf.org/pipermail/gutvol-p/attachments/20090716/07543a45/attachment.txt>


More information about the gutvol-p mailing list