
Well, not knowing what to do, I went to the Robots Readme on the Gutenberg.org web site and copied the wget command listed under the heading "Getting All EBook Files." I started this process on Sunday evening, at the end of a cable modem. Little did I realize that more than 24 hours later, the process would still be running. In a private message, I was told to use rsync. OK. If rsync is the preferred method, then why is wget presented as the example? It appears that I'm storing a bunch of index.html files that are redundant if I use rsync. I guess I can clean them up at my leisure. However, again the web page says "keep the html files" to make re-roboting faster. Well, I'll be a mirror site for all of the ZIP and HTML files, anyway. Please post suggestions here or pm me. Thank you. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner Sent: Monday, November 15, 2004 11:13 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] [etext04|etext05]/index.html missing? John Hagerson wrote:
I am using wget to download books from www.gutenberg.org. The process is stuck on etext04 in what appears to be a futile effort to download index.html.
The indexes are auto-generated on the fly by Apache. If the load on the fileservers is too high the connection times out before a full directory listing can be retrieved. You should not harvest at peak hours anyway. -- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d