[gutvol-p] Re: Programmatic fetching books from Gutenberg

16 Jul 2009

      It appears that the format seems to follow a rule sort of
#etext1802
"&f;1/8/0/1802/1802.txt
Thought it doesn't appear to be consistent. I saw something about a old
indexing scheme for files older than 10000. What is the scheme (can it be
guessed from the #number?) and is it going to disappear from the Gutenberg
server ? Or are you going to make redirects?

On Thu, Jul 16, 2009 at 6:55 PM, Paulo Levi <i30817@gmail.com> wrote:
...
Thing is, indexing takes a long time and occupies quite a lot of space. I
managed to reduce it by filtering the "important" parts of the rdf to index.
If the location of the text can be inferred from #etext1802 i prefer to use
that.
On Thu, Jul 16, 2009 at 8:05 AM, Marcello Perathoner <
marcello@perathoner.de> wrote:
...
Paulo Levi wrote:
...
I made a ebook reader
(here) http://code.google.com/p/bookjar/downloads/list
and i'd like to search and download Gutenberg books. I already have a
searcher prototype using LuceneSail a library that uses Lucene to index rdf
documents and only indexing what i want from the catalog.rdf.zip.
Now i'd like to know how from the url inside the catalog i can fetch the
book itself, and what are the variants for the formats.
A example query result:
 author: Shakespeare, William, 1564-1616
 url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802
 title: King Henry VIII
 So, i like to know how from the etext1802 number can i get a working url
to download the book, and how to construct variants for each format.
In the second half of the rdf file you will find records for all the files
in different formats we offer for an ebook. Use the #etext1802 as link
between book record and file records.