Marcello's story about big-pipe servers settling on pg makes me think of a Borg spaceship passing over a peaceful little village.
At the risk of sounding like Eleanor Clift's response to the Soviets in "The Watchmen" movie, I'll ask:
So what on earth are these big-pipe servers doing?
Are they generating their own independent collection in case of a collapse of the internet? Are they engaged in some really inefficient search algorithm that requires opening every single file? Are they some Google wannabe who's indexing your site? Is it malicious mischief/ DOS?
Or, is it a case of an "honest" (if cluelessly implemented) demand that could be met with some more products that could be torrented. Could that entity be looking for a MOBI of the top 1000 books, and EPUB of everything in the German language?
---------- Forwarded message ----------
From: Marcello Perathoner <marcello@perathoner.de>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Date: Tue, 02 Feb 2010 08:14:59 +0100
Subject: [gutvol-d] Re: Psychology of interacting with (Google's) ebooks.
Greg M. Johnson wrote:
I don't think that Google Books at least gets this. I spent so much time at Google Books, browsing in apparently spider-like fashion, that I got this warning:
"We're sorry...
... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now."
That may not be a quetion of getting `itด but of getting `hitด.
gutenberg.org too gets hit by dozens of spiders a day, some of them sitting on big pipes and working with up to a hundred threads.
While one of those spiders is at work, a human user can just about forget getting anything out of gutenberg.org because all server cycles are used to serve the spider.
This is why gutenberg.org automatically denies access to IPs that make more than a certain amount of requests per hour.
I think with Google the problem may be even worse than with gutenberg.org.
--
Marcello Perathoner