
New experimental top 100 books and authors at: http://www.gutenberg.net/catalog/world/top Did you know our most read authors are Various and Anonymous? Did you know our most downloaded eBooks are: 1. Audio: "The House of Usher" by Edgar Allan Poe 2. Audio: "Bleak House" by Charles Dickens 3. Vanity Fair by William Makepeace Thackeray 4. Ulysses by James Joyce with 1. being downloaded 5 times as often as 2., and 9 times as often as 3. ? Yes, there is a solution to the mystery. Anybody wants to apply his/her reasoning power? The solution is a few lines down. It turns out I had to disqualify those as well as a few others that have mp3 files and use the word "House" in the title. (Moreover, research has shown that "Usher" is a rap artist. :-) -- Marcello Perathoner webmaster@gutenberg.net

I'm not sure I received the entire message below, since there are a number of blank lines and then not what I expected below them. On Mon, 13 Sep 2004, Marcello Perathoner wrote:
New experimental top 100 books and authors at:
http://www.gutenberg.net/catalog/world/top
Did you know our most read authors are Various and Anonymous?
Did you know our most downloaded eBooks are:
1. Audio: "The House of Usher" by Edgar Allan Poe 2. Audio: "Bleak House" by Charles Dickens 3. Vanity Fair by William Makepeace Thackeray 4. Ulysses by James Joyce
with 1. being downloaded 5 times as often as 2., and 9 times as often as 3. ?
Was there supposed to be more to this list? If so, I think I missed the original posting of the Top 100, and searchin for "Top 100" didn't get to any recent messages.
Yes, there is a solution to the mystery. Anybody wants to apply his/her reasoning power?
The solution is a few lines down.
It turns out I had to disqualify those as well as a few others that have mp3 files and use the word "House" in the title. (Moreover, research has shown that "Usher" is a rap artist. :-)
OK, perhaps I was just expecting something more serious here, but. . . . I would include all files, not sure why to disqualtify MP3 files, or "house" remixes. . .hee hee! I think we should measure everything, though I think sub-lists would be acceptable. . .such as the "Whole Top 100," then fiction, non-fiction, .txt files, .htm files, .mp3 files, etc., etc., etc., Michael

Michael Hart wrote:
I would include all files, not sure why to disqualtify MP3 files, or "house" remixes. . .hee hee!
Because those files were downloaded in error by people who wanted to have mp3 files with "House" music. If I didn't disqualify them they would sit in front of the top list forever, being downloaded nearly 10 times oftener than the next non-"House" eBook. -- Marcello Perathoner webmaster@gutenberg.net

On Mon, 13 Sep 2004, Marcello Perathoner wrote:
Michael Hart wrote:
I would include all files, not sure why to disqualtify MP3 files, or "house" remixes. . .hee hee!
Because those files were downloaded in error by people who wanted to have mp3 files with "House" music.
If I didn't disqualify them they would sit in front of the top list forever, being downloaded nearly 10 times oftener than the next non-"House" eBook.
Wow!!! So these are people most likely useing WEBVCR programs to sweep up everything with "house" + "mp3" ??? Who'da thunk it??? Can you send me the Top 100 list[s]? Perhaps we can automate something to send me this, and we can put some notes in the Newsletter. . . . Thanks! Michael

Hi, Out of curiosity, which period of time does the list cover? And does it update automatically or will this list be outdated after a while? And how many times was the number one book actually downloaded? And number 2? And number 3? Ah well, you get the picture. I am actually very curious after number 100, the Anatomy of Melancholy's exact figures, it being one of DP's pet projects. Per Michael Hart's suggestion, I would also love to have more figures per category, perhaps also per language etc. Not sure if they are difficult to generate, but I love stats and there couldn't be enough of those on gutenberg.net in my view. Marcello, thanks a lot for creating the list! Miranda ----- Original Message ----- From: "Marcello Perathoner" <marcello@perathoner.de> To: "Michael S. Hart" <hart@pobox.com>; "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Monday, September 13, 2004 6:00 PM Subject: Re: [gutvol-d] Top 100
Michael Hart wrote:
I would include all files, not sure why to disqualtify MP3 files, or "house" remixes. . .hee hee!
Because those files were downloaded in error by people who wanted to have mp3 files with "House" music.
If I didn't disqualify them they would sit in front of the top list forever, being downloaded nearly 10 times oftener than the next non-"House" eBook.
-- Marcello Perathoner webmaster@gutenberg.net
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Miranda van de Heijning wrote:
Out of curiosity, which period of time does the list cover? And does it update automatically or will this list be outdated after a while?
Since Sep 03. It updates every night.
And how many times was the number one book actually downloaded? And number 2? And number 3? Ah well, you get the picture. I am actually very curious after number 100, the Anatomy of Melancholy's exact figures, it being one of DP's pet projects.
I have added the numbers.
Per Michael Hart's suggestion, I would also love to have more figures per category, perhaps also per language etc. Not sure if they are difficult to generate, but I love stats and there couldn't be enough of those on gutenberg.net in my view.
We'll see ... -- Marcello Perathoner webmaster@gutenberg.net

Miranda van de Heijning wrote:
Per Michael Hart's suggestion, I would also love to have more figures per category, perhaps also per language etc. Not sure if they are difficult to generate, but I love stats and there couldn't be enough of those on gutenberg.net in my view.
I seem to recall seeing a page analyzing downloads from a PG ftp site. (can't remember which one.) It had a huge mass of statistics, including most often requested files, average files sizes, domains downloaded from, breakdown by file types, etc. If I look through my old emails, I may be able to find it... Andrew

Andrew Sly wrote:
Per Michael Hart's suggestion, I would also love to have more figures per category, perhaps also per language etc. Not sure if they are difficult to generate, but I love stats and there couldn't be enough of those on gutenberg.net in my view.
I seem to recall seeing a page analyzing downloads from a PG ftp site. (can't remember which one.) It had a huge mass of statistics, including most often requested files, average files sizes, domains downloaded from, breakdown by file types, etc.
Start from: http://www.gutenberg.net/internal/stats/ user: books pass: internal there are stat files for the web page "pages" and the archive "files". Though the files page says ftp.ibiblio.org, all HTTP and FTP requests to the file archive are analyzed there. There are daily pages and a monthly page. Caveat emptor: many of the data there can be misleading if you dont know the tricks of the site. They are meant as a tool for me to watch if something goes terribly wrong, not as a download counting tool. -- Marcello Perathoner webmaster@gutenberg.net

Miranda van de Heijning wrote:
Out of curiosity, which period of time does the list cover? And does it update automatically or will this list be outdated after a while?
Since Sep 03. It updates every night.
And how many times was the number one book actually downloaded? And number 2? And number 3? Ah well, you get the picture. I am actually very curious after number 100, the Anatomy of Melancholy's exact figures, it being one of DP's pet projects.
I have added the numbers.
Per Michael Hart's suggestion, I would also love to have more figures
Thanks for the extra figures Marcello! I'm sure that will keep me usefully occupied on a weekly basis--there's a lot of interesting information in there. Not just for curiosity's sake, but also to determine what sort of books and authors are likely to get an audience. Of course this doesn't mean we should only work on the most popular works, but it would be a useful tool to identify any gaps we may have in the collection. Ala, just a thought. Miranda ----- Original Message ----- From: "Marcello Perathoner" <marcello@perathoner.de> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Tuesday, September 14, 2004 9:14 PM Subject: Re: [gutvol-d] Top 100 per
category, perhaps also per language etc. Not sure if they are difficult to generate, but I love stats and there couldn't be enough of those on gutenberg.net in my view.
We'll see ...
-- Marcello Perathoner webmaster@gutenberg.net
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
participants (4)
-
Andrew Sly
-
Marcello Perathoner
-
Michael Hart
-
Miranda van de Heijning