
Greg Newby wrote:
Marcello, can you tell me what it would take to grow our capacity to handle hits? I know you're also looking at Web site mirrors (I can supply some sites for this, BTW). But if you could come up with some recommendations for what it would take for iBiblio to dramatically grow our capacity, I can try to put something together for them.
We have doubled our page hits over the last year. We are now serving nearly 200.000 pages a day. Just recently we became a top 5000 internet site. See Alexa stats starting at: http://www.alexa.com/data/details/traffic_details?range=3m&size=large&compare_sites=gutenberg.net,promo.net&y=t&url=gutenberg.org To handle the ever increasing load we could implement one of the following solutions: 1) An array of on-site squids at ibiblio. But ibiblio isn't adding squids for the vhosted sites. At least that's what I was told. 2) Make ibiblio throw more hardware at us (all hosted sites). This may not be possible with the limited budget. They recently got a faster file server. 3) One or more dedicated squids for PG co-located at ibiblio. (Make ibiblio pay for the bandwidth.) Somebody had to donate us a server. Needs fast disks, lots of ram, average cpu, linux, ssh. 4) Big time solution. A hierarchy of squids distributed around the world. We would have a squid hierarchy like this: www.gutenberg.org (apache) + us1.cache.gutenberg.org (squid) + us2.cache.gutenberg.org (squid) + au.cache.gutenberg.org (squid) + eu.cache.gutenberg.org (squid) + de.cache.gutenberg.org (squid) + en.cache.gutenberg.org (squid) + fr.cache.gutenberg.org (squid) To do that we need squids 2.5 with the rproxy patch. I'm still exploring that solution, but if anybody has any experience please chime in. We need service providers donate us (or co-locate our) servers and donate the bandwidth. Also we need to explore the legal implications of offering PG services outside the US. The PG web site w/o file downloads averages a 5 GB of traffic / day. (The file downloads are 100 GB / day, but we ain't going to thrash the squids with the files.)
30 simultaneous requests to PostgreSQL does not seem like a whole lot, so I'm assuming that contention for resources with other hosted sites is the main problem. It would be nice to do better.
I just asked ibiblio to double that. I'm not sure why the limit is so low. -- Marcello Perathoner webmaster@gutenberg.org