Re: [gutvol-d] [etext04|etext05]/index.html missing?

15 Nov 2004


      On Mon, Nov 15, 2004 at 06:13:06PM +0100, Marcello Perathoner wrote:
...
John Hagerson wrote:
...
I am using wget to download books from www.gutenberg.org. The process is
stuck on etext04 in what appears to be a futile effort to download
index.html.
The indexes are auto-generated on the fly by Apache.
If the load on the fileservers is too high the connection times out 
before a full directory listing can be retrieved.
You should not harvest at peak hours anyway.
One more thing (or two):
- you can't get the big directories via FTP.  Use HTTP.
(The FTP servers stop after 2K items).

- Don't use HTTP, use rsync.  See the mirroring HOWTO
at gutenberg.org/howto for more info (yes, you can use
rsync to just get particular directories, filename extensions,
etc.).

But if things are still weird, send something we can
replicate and we'll help fix it!
  -- gbn

Re: [gutvol-d] [etext04|etext05]/index.html missing?

Greg Newby