
On Mon, Nov 15, 2004 at 06:13:06PM +0100, Marcello Perathoner wrote:
John Hagerson wrote:
I am using wget to download books from www.gutenberg.org. The process is stuck on etext04 in what appears to be a futile effort to download index.html.
The indexes are auto-generated on the fly by Apache.
If the load on the fileservers is too high the connection times out before a full directory listing can be retrieved.
You should not harvest at peak hours anyway.
One more thing (or two): - you can't get the big directories via FTP. Use HTTP. (The FTP servers stop after 2K items). - Don't use HTTP, use rsync. See the mirroring HOWTO at gutenberg.org/howto for more info (yes, you can use rsync to just get particular directories, filename extensions, etc.). But if things are still weird, send something we can replicate and we'll help fix it! -- gbn