
http://www.worldebookfair.com It was on an overloaded network connection earlier, but we moved it this (Wednesday) morning and the site seems to be performing well. Take a look - it's pretty neat! There are a few missing files & broken links, but for the most part things seem OK. -- Greg

Still seems to be really slow, pretty much the same as yesterday (the first time I looked). I am getting < 1k download a lot of the time. Quoting Greg Newby <gbnewby@pglaf.org>:
It was on an overloaded network connection earlier, but we moved it this (Wednesday) morning and the site seems to be performing well.
Take a look - it's pretty neat!
------------------------------------------------------------ This email was sent from Netspace Webmail: http://www.netspace.net.au

Can you check which IP address your machine resolves www.worldebookfair.com to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). Perhaps your DNS simply hasn't updated, or maybe there's congestion between you and readingroo.ms, but I'd like to know before I try adding some of the rate limiting stuff Greg has asked me to look into. On Thu, Jul 06, 2006 at 12:46:14PM +1000, rnmscott@netspace.net.au wrote:
Still seems to be really slow, pretty much the same as yesterday (the first time I looked). I am getting < 1k download a lot of the time.
Quoting Greg Newby <gbnewby@pglaf.org>:
It was on an overloaded network connection earlier, but we moved it this (Wednesday) morning and the site seems to be performing well.
Take a look - it's pretty neat!

PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms Quoting joey <joey@joeysmith.com>:
Can you check which IP address your machine resolves www.worldebookfair.com to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). Perhaps your DNS simply hasn't updated, or maybe there's congestion between you and readingroo.ms, but I'd like to know before I try adding some of the rate limiting stuff Greg has asked me to look into.
------------------------------------------------------------ This email was sent from Netspace Webmail: http://www.netspace.net.au

You're still getting a connection to the old server. The new server is 208.99.202.194 On Thu, Jul 06, 2006 at 02:40:28PM +1000, rnmscott@netspace.net.au wrote:
PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms
Quoting joey <joey@joeysmith.com>:
Can you check which IP address your machine resolves www.worldebookfair.com to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). Perhaps your DNS simply hasn't updated, or maybe there's congestion between you and readingroo.ms, but I'd like to know before I try adding some of the rate limiting stuff Greg has asked me to look into.

On Wed, Jul 05, 2006 at 11:20:27PM -0600, joey wrote:
You're still getting a connection to the old server. The new server is 208.99.202.194
Right. I changed the network TTL (the time before a cached IP address expires) from 1 day to 1 hour, so further changes will propagate faster. Depending on what network connection you're using, you might be able to force a cache reload (rebooting your system often works). During my testing earlier, I had it pushing 50Mbp/s. It's been averaging about 8Mbps all day. -- Greg
On Thu, Jul 06, 2006 at 02:40:28PM +1000, rnmscott@netspace.net.au wrote:
PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms
Quoting joey <joey@joeysmith.com>:
Can you check which IP address your machine resolves www.worldebookfair.com to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). Perhaps your DNS simply hasn't updated, or maybe there's congestion between you and readingroo.ms, but I'd like to know before I try adding some of the rate limiting stuff Greg has asked me to look into.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Yes, I noticed the slowness. It seems much better now. I have a question though. It says that you can download all ebooks. I don't care about many of them but I would like to grab at least a few hundred if not a few thousands. How? Do I really have to individually download every single pdf file by hand? I don't expect a nice ftp/rsync/http directory listing, but it would at least be nice if all the titles from a certain collection could be on one search page. If you use the search form, you only get the first 10 results. The "browse collections" page only shows a random sampling of titles from any particular collection. Also, the numbers are wrong or I'm doing something wrong. One figure shows about 250,000 pdf files, another shows 330,000 depending on whether you search or not. The Census page shows about 30,000 pdf files but the search shows about 52,000. I tried the advanced search but that seems to only be a help document unless I did something wrong. I freely admit that I'm missing something here. What am I missing? Should I be using a more specific search syntax to get what I want, i.e. all books from one collection on a page? Is there a way to show 50 results instead of 10? Also, what about the missing files? I looked at some rocketry links on the NASA collection and got error 404. Where are they? Other than security, is there any reason to not allow raw directory lists? That would make downloading much easier. With the Baen books, how do I find titles? The page only lists ISBNs and authors but no titles except for mp3 samples. Finally, are any of these going to eventually make it to the main PG site? Some are public domain and there is no reason why they can't be part of PG except for possible layout and pdf issues. At 05:20 PM 7/5/06 -0700, you wrote:
It was on an overloaded network connection earlier, but we moved it this (Wednesday) morning and the site seems to be performing well.
Take a look - it's pretty neat!
There are a few missing files & broken links, but for the most part things seem OK. -- Greg
-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06

On Thu, Jul 06, 2006 at 01:03:46AM -0700, Tony Baechler wrote:
Yes, I noticed the slowness. It seems much better now. I have a question though. It says that you can download all ebooks. I don't care about many of them but I would like to grab at least a few hundred if not a few thousands. How? Do I really have to
Tony, please send WEF questions directly to John, cc'd or John Guagliardo <john@guagliardo.cc> Most of the collections (though not all) do not have access except via search.
individually download every single pdf file by hand? I don't expect a nice ftp/rsync/http directory listing, but it would at least be nice if all the titles from a certain collection could be on one search page. If you use the search form, you only get the first 10 results. The "browse collections" page only shows a random sampling of titles from any particular collection. Also, the numbers are wrong or I'm doing something wrong. One figure shows about 250,000 pdf files, another shows 330,000 depending on whether you search or not. The Census page shows about 30,000 pdf files but the search shows about 52,000. I tried the advanced search but that seems to only be a help document unless I did something wrong.
There are about 330,000 files; we talk about 250,000 (1/4 million) to take overlap into account. I don't know about the Census docs, You're right that "Advanced Search" really just gives help.
I freely admit that I'm missing something here. What am I missing? Should I be using a more specific search syntax to get what I want, i.e. all books from one collection on a page? Is there a way to show 50 results instead of 10? Also, what about the missing files? I looked at some rocketry links on the NASA collection and got error 404. Where are they? Other than security, is there any
There are two main sources of 404s now: 1) Some files are case-sensitive (many came from a Windoze system). We're working on this. 2) A few collections are still being loaded into the different servers. We're working on this, too. You can email John or me specific failed filenames, and I can try to locate them. That's something I can do.
reason to not allow raw directory lists? That would make downloading much easier. With the Baen books, how do I find titles? The page only lists ISBNs and authors but no titles except for mp3 samples. Finally, are any of these going to eventually make it to the main PG site? Some are public domain and there is no reason why they can't be part of PG except for possible layout and pdf issues.
Nothing will make it to the main PG site without "someone" doing the work! But most of the public domain content is already on pgcc.net , so it's not going to go away after August 4. I don't know about directory listings etc., those are questions for John. -- Greg
At 05:20 PM 7/5/06 -0700, you wrote:
It was on an overloaded network connection earlier, but we moved it this (Wednesday) morning and the site seems to be performing well.
Take a look - it's pretty neat!
There are a few missing files & broken links, but for the most part things seem OK. -- Greg
-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06

As I am looking up authors etc. for the PG online catalog, and just generally browsing, I seem to be constantly running into more websites that have transcribed material that could be added to PG. With a little effort, I could make a list for you of dozens of sites, with thousands of books that could be adapted. However, getting copyright clearance, and reformatting these is perhaps not as "glamarous" as Distributed Proofreading, so it does not attract as many people. :) Though I already have too many different PG projects I'm in the middle of, I would be willing to help if you'd like to start processing some of these texts. Andrew On Thu, 6 Jul 2006, Greg Newby wrote:
On Thu, Jul 06, 2006 at 01:03:46AM -0700, Tony Baechler wrote:
samples. Finally, are any of these going to eventually make it to the main PG site? Some are public domain and there is no reason why they can't be part of PG except for possible layout and pdf issues.
Nothing will make it to the main PG site without "someone" doing the work! But most of the public domain content is already on pgcc.net , so it's not going to go away after August 4.

On 7/6/06, Andrew Sly <sly@victoria.tc.ca> wrote:
However, getting copyright clearance, and reformatting these is perhaps not as "glamarous" as Distributed Proofreading, so it does not attract as many people. :)
It's not just glamarous, it's hard. You have to go through all the work of finding a specific edition that may not be well-identified in the ebook. You have to dump the text in such a way that doesn't lose all the formatting information, which may range from easy to hard, but will certainly require custom code and massaging. You have to work with a text that is unlikely to be the quality of what DP can produce after five rounds, and could turn out to be pretty bad. And it requires some tedious comparison. I'd actually rather rescan and reprocess and compare after a lot of times rather than try and reformat existing material. If we can't get information for clearing from the source, or at least handle them as a group, they're pretty hard to do.
participants (6)
-
Andrew Sly
-
David Starner
-
Greg Newby
-
joey
-
rnmscott@netspace.net.au
-
Tony Baechler