[gutvol-p] Re: Programmatic fetching books from Gutenberg

16 Jul 2009

      Paulo Levi wrote:
...
It appears that the format seems to follow a rule sort of
#etext1802
"&f;1/8/0/1802/1802.txt
&f; is an xml entity which is defined at the top of the rdf file. 
Retrieve the url &f;1/8/0/1802/1802.txt and you get the file.

This is an xml file and you *really* should parse it with an xml parser. 
That will take care of all these problems for you. If you are concened 
about memory, use a sax parser.
...
Thought it doesn't appear to be consistent. I saw something about a old indexing 
scheme for files older than 10000. What is the scheme (can it be guessed from 
the #number?) and is it going to disappear from the Gutenberg server ? Or are 
you going to make redirects?
If you parse the rdf file the xml way you get the current url of the 
ebook files regardless of old/new directory scheme and any future file 
moves.

[gutvol-p] Re: Programmatic fetching books from Gutenberg

Marcello Perathoner