[gutvol-p] Re: Programmatic fetching books from Gutenberg
i30817 at gmail.com
Thu Jul 16 11:19:07 PDT 2009
It appears that the format seems to follow a rule sort of
Thought it doesn't appear to be consistent. I saw something about a old
indexing scheme for files older than 10000. What is the scheme (can it be
guessed from the #number?) and is it going to disappear from the Gutenberg
server ? Or are you going to make redirects?
On Thu, Jul 16, 2009 at 6:55 PM, Paulo Levi <i30817 at gmail.com> wrote:
> Thing is, indexing takes a long time and occupies quite a lot of space. I
> managed to reduce it by filtering the "important" parts of the rdf to index.
> If the location of the text can be inferred from #etext1802 i prefer to use
> On Thu, Jul 16, 2009 at 8:05 AM, Marcello Perathoner <
> marcello at perathoner.de> wrote:
>> Paulo Levi wrote:
>>> I made a ebook reader
>>> (here) http://code.google.com/p/bookjar/downloads/list
>>> and i'd like to search and download Gutenberg books. I already have a
>>> searcher prototype using LuceneSail a library that uses Lucene to index rdf
>>> documents and only indexing what i want from the catalog.rdf.zip.
>>> Now i'd like to know how from the url inside the catalog i can fetch the
>>> book itself, and what are the variants for the formats.
>>> A example query result:
>>> author: Shakespeare, William, 1564-1616
>>> url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802
>>> title: King Henry VIII
>>> So, i like to know how from the etext1802 number can i get a working url
>>> to download the book, and how to construct variants for each format.
>> In the second half of the rdf file you will find records for all the files
>> in different formats we offer for an ebook. Use the #etext1802 as link
>> between book record and file records.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 2688 bytes
Desc: not available
More information about the gutvol-p