Fw: [gweekly] 15,000th Project Gutenberg eBook Released

newer
re: [gutvol-d] Top 1000 collection...

Miranda van de Heijning

9 Jan 2005 9 Jan '05

12:53 p.m.

Perhaps I don't understand Moore's Law properly, but aren't we actually well behind on its schedule? If we published our 10,000th book at the end of 2003, doubling in 1.5 years means 20,000 books in mid-2005. At the current rate we are not likely to reach that figure--anywhere between 16k and 17k seems more realistic. It's not something to be ashamed of, as we are still extending the archive at a very respectable rate, so can we not just be honest about it and admit that we while we predicted 20,000 based on Moore's Law, we are no longer growing that quickly. I've always understood Moore's Law to be just a prediction, not a target, so we haven't failed anything. :-) Other than that, congratulations on reaching 15,000! What was posted as eBook 15000? To: "Project Gutenberg Weekly Newsletter" <gweekly@lists.pglaf.org> Sent: Saturday, January 08, 2005 8:00 PM Subject: [gweekly] 15,000th Project Gutenberg eBook Released

...

Congratulations to the Project Gutenberg Volunteers!!!

In the last hour Project Gutenberg released their 15,000th eBook.

More details will be available in Wednesday's email Newsletters.

This far exceeds Moore's Law projections from 10 eBooks in 1990, which would predict 15,000 around August, 2006, and which every pundit has continually said was an impossible growth rate:

Projected Growth Rate

Total Date Doubled Years

10 Dec, 1990 0 0 20 Jun, 1992 1 1.5 40 Dec, 1993 2 3 80 Jun, 1995 3 4.5 160 Dec, 1996 4 6 320 Jun, 1998 5 7.5 640 Dec, 1999 6 9 1280 Jun, 2001 7 10.5 2560 Dec, 2002 8 12 5120 Jun, 2004 9 13.5 10240 Dec, 2005 10 15 15000 Aug, 2006 10.5 15+ <<< Predicted Date for ~15,000 20480 Jun, 2007 11 16.5

Our many thanks to all the thousands of Gutenberg volunteers!!!

Michael S. Hart

_______________________________________________ gweekly mailing list gweekly@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gweekly

Show replies by date

Holden McGroin

9 Jan 9 Jan

1:53 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Miranda van de Heijning wrote:

...

Perhaps I don't understand Moore's Law properly, but aren't we actually well behind on its schedule? If we published our 10,000th book at the end of 2003, doubling in 1.5 years means 20,000 books in mid-2005. At the current rate we are not likely to reach that figure--anywhere between 16k and 17k seems more realistic.

It's not something to be ashamed of, as we are still extending the archive at a very respectable rate, so can we not just be honest about it and admit that we while we predicted 20,000 based on Moore's Law, we are no longer growing that quickly. I've always understood Moore's Law to be just a prediction, not a target, so we haven't failed anything. :-)

Other than that, congratulations on reaching 15,000! What was posted as eBook 15000?

Recently, we haven't matched Moore's law's growth rates, not that there's ANYTHING AT ALL wrong with that - Project Gutenberg is still growing at its fastest rate ever. However, if you look over the entire history of PG, you'll see that on average, we're quite ahead of Moore's law. If our production grows at less than the Moore's law rate for much longer, then our average growth rate may slip below that predicted by Moore. It's a nice statistic while it lasts but I think everybody at PG know what it's all about: producing e-books not just quickly but WELL. At some point in the future, PG may produce less books than, say, the Million Book Project. However, if you compare PG's double/triple proofread texts against the Million Book Project's unproofed OCR from quickly-done scans, I think people will see where the true value lies. Cheers, Holden

Michael Hart

4:45 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

On Sun, 9 Jan 2005, Miranda van de Heijning wrote:

...

Perhaps I don't understand Moore's Law properly, but aren't we actually well behind on its schedule? If we published our 10,000th book at the end of 2003, doubling in 1.5 years means 20,000 books in mid-2005. At the current rate we are not likely to reach that figure--anywhere between 16k and 17k seems more realistic.

When you've been watching a growth curve for a long time, the changes don't seem as drastic as when you watch for a short time. Any growth rate prediction has to have a starting point, we have always used 1990, though we could restart with other years, and, yes, given that there are ups and downs in any real growth curve, you could always pick either the highs or lows [as they do in government statistics] and then skew the results massively as they change exponentially. Since we grew so rapidly in some periods, sometimes in excess of TWICE the Moore's Law predicitons, you could always start with those years as a baseline to demonstrate that we are no longer growing at TWICE the Moore's Law rate. However, 1990 is the first year we actually made totally consistent additions every month to the collection, so it's the best start point. When you map things out over very long periods, all the bumps in the road seem to flatten out. . .I can send you a graph from 1990 to 2005 if you like, so you can see that unless you use a much larger graph than I can include in this format, it looks very smooth, even though it switches from a resolution of years to months at the top. mh

Marcello Perathoner

9:31 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Michael Hart wrote:

...

However, 1990 is the first year we actually made totally consistent additions every month to the collection, so it's the best start point.

So you pick an arbitrary starting point to make the math come out right. Why not choose 1971 as starting point and accept where the math gets you: we are *behind* Moore's Law. -- Marcello Perathoner webmaster@gutenberg.org

Michael Hart

10 Jan 10 Jan

6:11 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

On Sun, 9 Jan 2005, Marcello Perathoner wrote:

...

Michael Hart wrote:

...
However, 1990 is the first year we actually made totally consistent additions every month to the collection, so it's the best start point.

So you pick an arbitrary starting point to make the math come out right.

Why not choose 1971 as starting point and accept where the math gets you: we are *behind* Moore's Law.

Hardly arbitrary, even as you yourself quoted above. 1990 was the first year of monthly production, a regular Newsletter, and most of the other things associated with Project Gutenberg. Growth in the 1970's was pretty much on a once a year basis, as there were severe limitations on our space allocations. The 1980's were pretty much devoded to Shakespeare and The Bible. Thus 1990 represents the best place to begin. If you think this is recent reasoning, I quote below from one of our old index files from the period: *** The Bible and Shakespeare represented the entire effort for the 1980's, and the Bible alone is about 1,000 times larger than our first file, the U.S. Declaration of Independence, and so is the Complete Shakespeare. [That Shakespeare was never released due to changes in the copyright law] Dec 1979 Abraham Lincoln's First Inaugural Address [linc1xxx.xxx] 9 Dec 1978 Abraham Lincoln's Second Inaugural Address [linc2xxx.xxx] 8 Dec 1977 The Mayflower Compact [mayflxxx.xxx] 7 Dec 1976 Give Me Liberty Or Give Me Death, Patrick Henry [liberxxx.xxx] 6 Dec 1975 The United States' Constitution [constxxx.xxx] 5 Nov 1973 Gettysburg Address, Abraham Lincoln [gettyxxx.xxx] 4 Nov 1973 John F. Kennedy's Inaugural Address [jfkxxxxx.xxx] 3 Dec 1972 The United States' Bill of Rights [billxxxx.xxx] 2 Dec 1971 Declaration of Independence [whenxxxx.xxx] 1

Gardner Buchanan

9 Jan 9 Jan

6:02 p.m.

New subject: Fw: [gweekly] 15,000th Project Gutenberg eBook Re

Hi Miranda, On 12:53:14 Miranda van de Heijning wrote:

...

Other than that, congratulations on reaching 15,000! What was posted as eBook 15000?

As near as I can make out, we are not there yet. 14639 is the highest number I can see and it was posted just now: 9-jan-2005. ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today.

Michael Dyck

8:15 p.m.

New subject: Fw: [gweekly] 15,000th Project Gutenberg eBook Re

Gardner Buchanan wrote:

...

On 12:53:14 Miranda van de Heijning wrote:

...
Other than that, congratulations on reaching 15,000! What was posted as eBook 15000?

As near as I can make out, we are not there yet. 14639 is the highest number I can see and it was posted just now: 9-jan-2005.

I suspect Michael Hart has included the PG of Australia collection (400 on Wednesday) in his grand total. -Michael

Michael Hart

10 Jan 10 Jan

6:17 p.m.

New subject: Fw: [gweekly] 15,000th Project Gutenberg eBook Re

On Sun, 9 Jan 2005, Michael Dyck wrote:

...

Gardner Buchanan wrote:

...
On 12:53:14 Miranda van de Heijning wrote:

...
Other than that, congratulations on reaching 15,000! What was posted as eBook 15000?

As near as I can make out, we are not there yet. 14639 is the highest number I can see and it was posted just now: 9-jan-2005.

I suspect Michael Hart has included the PG of Australia collection (400 on Wednesday) in his grand total.

We always have, with PGAU's permission. Hopefully we will soon be adding some from PGEU, PG Canada, etc. Michael

Michael Hart

6:23 p.m.

New subject: Fw: [gweekly] 15,000th Project Gutenberg eBook Re

On Sun, 9 Jan 2005, Gardner Buchanan wrote:

...

Hi Miranda,

On 12:53:14 Miranda van de Heijning wrote:

...
Other than that, congratulations on reaching 15,000! What was posted as eBook 15000?

As near as I can make out, we are not there yet. 14639 is the highest number I can see and it was posted just now: 9-jan-2005.

There were several candiates being discussed for the actual file to be labeled as /15000 a few months ago, but it's been pretty quiet about what has actually been chosen. . . . ;-)

Marcello Perathoner

9 Jan 9 Jan

9:11 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Michael Hart wrote:

...

...
This far exceeds Moore's Law projections from 10 eBooks in 1990, which would predict 15,000 around August, 2006, and which every pundit has continually said was an impossible growth rate:

Projected Growth Rate

Total Date Doubled Years

10 Dec, 1990 0 0 20 Jun, 1992 1 1.5 40 Dec, 1993 2 3 80 Jun, 1995 3 4.5 160 Dec, 1996 4 6 320 Jun, 1998 5 7.5 640 Dec, 1999 6 9 1280 Jun, 2001 7 10.5 2560 Dec, 2002 8 12 5120 Jun, 2004 9 13.5 10240 Dec, 2005 10 15 15000 Aug, 2006 10.5 15+ <<< Predicted Date for ~15,000 20480 Jun, 2007 11 16.5

Bzzzzt, wrong. But thank you for playing! You tried to show that the number of books in the collection obeys Moore's Law. Moore's Law tries to fit the data to an 2 ^ t exponential curve with a doubling rate of 1.5 years. In that case we have: you started in 1971 and we have reached 10.000 books by the end of 2003. That's roughly 33 years for 10000 books. With 33 years and 10000 books we get: x * 2 ^ (33 / 1.5) = 10000 and we solve: x = 0.002384 A year later than book 10.000 we should have gotten to: 0.002384 * 2 ^ (34 / 1.5) = 15873 which we have failed to do. We should get to #20.000 a year and a half after #10.000. That would be May 2005. So much for Moore's Law, which, by the way, doesn't work well in computer science either, but is for some strange reason one of the most-cited "Laws". I'll attach a plot of the function: 0.002384 * 2 ^ ((x - 1971) / 1.5) starting at x = 2000 and ending at x = 2008. That is, if the attachement comes thru. Otherwise use Gnuplot to plot it yourself. -- Marcello Perathoner webmaster@gutenberg.org

Michael Hart

10 Jan 10 Jan

6:13 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

As I said, you can pick the high or low points and make it appear as if the growth rate was either much greater or much less than the Moore's Law prediction. However, I didn't believe anyone would be silly enough to DO it, and expect any credence. . . . Michael On Sun, 9 Jan 2005, Marcello Perathoner wrote:

...

Michael Hart wrote:

...
...
This far exceeds Moore's Law projections from 10 eBooks in 1990, which would predict 15,000 around August, 2006, and which every pundit has continually said was an impossible growth rate:

Projected Growth Rate

Total Date Doubled Years

10 Dec, 1990 0 0 20 Jun, 1992 1 1.5 40 Dec, 1993 2 3 80 Jun, 1995 3 4.5 160 Dec, 1996 4 6 320 Jun, 1998 5 7.5 640 Dec, 1999 6 9 1280 Jun, 2001 7 10.5 2560 Dec, 2002 8 12 5120 Jun, 2004 9 13.5 10240 Dec, 2005 10 15 15000 Aug, 2006 10.5 15+ <<< Predicted Date for ~15,000 20480 Jun, 2007 11 16.5

Bzzzzt, wrong. But thank you for playing!

You tried to show that the number of books in the collection obeys Moore's Law. Moore's Law tries to fit the data to an 2 ^ t exponential curve with a doubling rate of 1.5 years.

In that case we have: you started in 1971 and we have reached 10.000 books by the end of 2003. That's roughly 33 years for 10000 books.

With 33 years and 10000 books we get:

x * 2 ^ (33 / 1.5) = 10000

and we solve:

x = 0.002384

A year later than book 10.000 we should have gotten to:

0.002384 * 2 ^ (34 / 1.5) = 15873

which we have failed to do.

We should get to #20.000 a year and a half after #10.000. That would be May 2005.

So much for Moore's Law, which, by the way, doesn't work well in computer science either, but is for some strange reason one of the most-cited "Laws".

I'll attach a plot of the function:

0.002384 * 2 ^ ((x - 1971) / 1.5)

starting at x = 2000 and ending at x = 2008. That is, if the attachement comes thru. Otherwise use Gnuplot to plot it yourself.

-- Marcello Perathoner webmaster@gutenberg.org

Marcello Perathoner

6:36 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Michael Hart wrote:

...

As I said, you can pick the high or low points and make it appear as if the growth rate was either much greater or much less than the Moore's Law prediction.

However, I didn't believe anyone would be silly enough to DO it, and expect any credence. . . .

YOU did it. -- Marcello Perathoner webmaster@gutenberg.org

Michael Dyck

9 Jan 9 Jan

9:26 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Miranda van de Heijning wrote:

...

Perhaps I don't understand Moore's Law properly, but aren't we actually well behind on its schedule?

It depends on what what you use as your reference point. E.g.: 10000 in Oct 2003 predicts ~17,000 now, and we're slightly behind; 1000 in Oct 1997 predicts ~30,000 now, and we're way behind; 100 in Dec 1993 predicts ~17,000 now, and we're slightly behind; 10 in Dec 1990 predicts ~7,000 now, and we're way ahead. The earliest reference I can find to Moore's Law in relation to PG's growth rate is in the Nov 27, 2002 weekly newsletter: http://www.gutenberg.net/newsletter/archive/PGWeekly_2002_11_27.txt in which Michael Hart uses '100 in Dec 1993' as the reference point. PG's total stayed remarkably close to that model (maybe only 1 or 2% above it) for all of 2002 and 2003, but started falling away from it in early 2004. In the last 6 months, the PG total has increased by about 14%, from 13,155 on 2004-07-07 to 15,000 on 2005-01-08, which puts it close to a 'doubling every 2.6 years' curve. -Michael

Tim Meekins

10:27 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

...

From the last newsletter, I think this is the most telling stat:

338 Average Per Month in 2004 355 Average Per Month in 2003 203 Average Per Month in 2002 103 Average Per Month in 2001 4049 New eBooks in 2004 4164 New eBooks in 2003 2441 New eBooks in 2002 1240 New eBooks in 2001 We've done FEWER books in 2004 than in 2003... At that rate, I don't see how we could be keeping up with Moore's Law. We did pretty good from 2001 to 2003, but we've started to plateau, if not slide back a bit. I'm sure we will see much more growth, but at a steady Moore's Law curve, I'm not so sure.

John Hagerson

10:42 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

I assert that the 2004 numbers are lower for a number of reasons (not necessarily in order of importance): 1. We artificially divide one printed work into multiple eBooks much less often than we used to. 2. We are much less likely to post an HTML (or .lit, or .doc) copy of a work under a separate eBook number. 3. We are taking more time to make sure that every book we post is of very high quality. All of these factors make it harder to keep up with Moore's "Law". I think that we should celebrate what we have done and what we are doing and not fret that we aren't "doubling our output every eighteen months." John Hagerson -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Tim Meekins Sent: Sunday, January 09, 2005 4:27 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Fw: [gweekly] 15,000th Project Gutenberg eBook Released

...

From the last newsletter, I think this is the most telling stat:

steve harris

10:55 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Folks - I agree with Tim M that the fact that our output is stagnant is very telling; its much more important than whether we are complying with Moore's Law, Megan's Law or Murphy's Law. While one important issue is the post-proofing bottleneck in DP (which is being given attention), as important but more fundamental is whether the PG project/organization/effort is positioned for growth. In my view, chugging along at 5-10K per year is very nice, but will be increasingly marginalized by other efforts (whether Google-wise or otherwise). It also raises a difficult issue: If we are going to do a significant chunk of the public domain in a reasonably short period, it probably doesn't matter in what order we do the books. If we are only going to get to 50K over the next 5 years or to 100K over the next 10 years, we should probably give some thought to where we should put our efforts (e.g. what is the relative value of "The Yale Shakespeare" on top of the several versions of each play we already have?) Steve Harris pg@steveharris.net

...

-----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Tim Meekins Sent: Sunday, January 09, 2005 2:27 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Fw: [gweekly] 15,000th Project Gutenberg eBook Released

...
From the last newsletter, I think this is the most telling stat:

338 Average Per Month in 2004 355 Average Per Month in 2003 203 Average Per Month in 2002 103 Average Per Month in 2001

4049 New eBooks in 2004 4164 New eBooks in 2003 2441 New eBooks in 2002 1240 New eBooks in 2001

We've done FEWER books in 2004 than in 2003... At that rate, I don't see how we could be keeping up with Moore's Law. We did pretty good from 2001 to 2003, but we've started to plateau, if not slide back a bit. I'm sure we will see much more growth, but at a steady Moore's Law curve, I'm not so sure.

_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Sebastien Blondeel

10 Jan 10 Jan

12:12 p.m.

New subject: PG editorial policy? [Re: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released]

On Sun, Jan 09, 2005 at 02:55:28PM -0800, steve harris wrote:

...

While one important issue is the post-proofing bottleneck in DP (which is being given attention), as important but more fundamental is whether

Can you give details about this bottleneck issue?

...

the PG project/organization/effort is positioned for growth.

...

It also raises a difficult issue: If we are going to do a significant chunk of the public domain in a reasonably short period, it probably doesn't matter in what order we do the books.

You are touching here the problem of the (lack of) editorial policy of PG / PGDP. I tried to "centralize" the list of French books being worked on (or finished) on PG, both PGDP, and ebooksgratuits.com: http://www.eleves.ens.fr/home/blondeel/PGDP/catalog/ A friend of mine studying literature "promised" to give me "the list of all French books ever, sorted by descending importance" when she could ask her professor --- or anything "close" to that, because of course this list is impossible to make (even without an "importante" rating). For Halloween, some French-language PGDP PMs just grep'ed the string "fantome" (French for "ghost") in Gallica[*] (a big public website with many scans of books, and those guys agreed for PGDP to use their images). [*] http://gallica.bnf.fr/ The ebooksgratuits.com people sometimes work on all the books of a given author. All this makes for a not very coherent, consistent editorial policy. I guess literature people can easily criticize the PG French catalog (some very obscure books, and some blatant misses). Of course the obvious answer would be "stop whining and do it yourself then!" but those people just don't work this way (think "psychology"). They're not hackers, they don't have this culture of "let's get involved, roll up our sleeves and change the world", but still they could be useful to PG. When I proof pages in PGDP, I usually work on the oldest book sitting around. It's both a feeling of "duty" and it makes me discover things I wouldn't have without that. So the time I spend on obscure books is not spent on more "important" ones. On the other hand, it would be difficult to set up an official editorial board: of course it should not be too bureaucratic and complicated, of course it should not have a monopoly of the books proposed to PGDP (PMs would still be free to kick in books they just like, keeping in mind they will delay the more "important" books. We work in limited resources, so we should define priorities). But above all we are missing the competent people: I guess a bunch of University professors specialized in pre-XXth century literature, history, philosophy etc. would do, but how many of those know PG? (If you don't like scholars because they tend to be non pragmatic and argue about pointless details, replace that with: essay writers, journalists, whoever is important in the "culture" of the language considered). Has PG achieved any kind of fruitful collaboration with scholars? I could use my ink-jet printer and phone diary and send a mailing to random French literature professors I would find, but that would not look very credible to them (no matter how important and nice PG is anyway). References would help to make a bootstrap, then these people could relay the information between themselves and to their students (all master thesis dealing with old books could give PG their stuff, etc.; students could work on some books in the PGDP's, etc.). Plus having some "official title" (like "PG French editorial board") in something that looks important and more and more important with time ("PG") could maybe give them the incentive to help us a little.

Robert Shimmin

6:27 p.m.

New subject: PG editorial policy? [Re: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released]

Whether production is levelling off or not, I increasingly find one conclusion inexorable: A few thousand of us, in a few years, have produced 15,000 books. The public domain contains, say, 10 million volumes. To digitize it using our current methods in a reasonable amount of time will require a million-person volunteer force. I'm not sure how to recruit a million proofreaders, but if anyone has some good ideas for finding the next 10,000, we should listen. -- RS

steve harris

9:03 p.m.

New subject: PG editorial policy? [Re: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released]

On Jan. 10, Sebastian Blondeel wrote:

...

On Sun, Jan 09, 2005 at 02:55:28PM -0800, steve harris wrote:

...
While one important issue is the post-proofing bottleneck in DP (which is being given attention), as important but more fundamental is whether

Can you give details about this bottleneck issue?

There are about 2200 books in DP that have been proofed, but are in various stages of Post-porrfing processing. You can see the specifics at DP's Stats Central.

...

On the other hand, it would be difficult to set up an official editorial board: of course it should not be too bureaucratic and complicated, of course it should not have a monopoly of the books proposed to PGDP (PMs would still be free to kick in books they just like, keeping in mind they will delay the more "important" books. We work in limited resources, so we should define priorities).

I don't support the need for an 'official editorial board', certainly not a group to exclude one work or another. At the same time, I think it would help if there was a group/process that gathered a list of works we would encourage people to work on. I did my own for the past two years at www.steveharris.net/PGList.htm .

...

But above all we are missing the competent people: I guess a bunch of University professors specialized in pre-XXth century literature, history, philosophy etc. would do, but how many of those know PG? (If you don't like scholars because they tend to be non pragmatic and argue about pointless details, replace that with: essay writers, journalists, whoever is important in the "culture" of the language considered).

I think it would be a great set of projects if someone wanted to contact the MLA or American historical Association or other group and worked with them on generating a list of key works in each area. It's the sort of contact that could lead to greater uses of the PG collection, as well. More broadly, PG has focused on copyright-production-posting segments. A more robust view extending to both text collection and distribution/use of the materials would be a good way to be more effective in our core functions as well as extend the scope and usefulness of our product. I also think it would be useful if PG were to have enough management that such efforts could be endorsed and facilitated, not just left to people working on their own. To me, the open source coding groups, like the apache foundation or mozilla are useful non-coercive organizational models. Thx, smh

...

Robert Shimmin

9:35 p.m.

New subject: PG editorial policy?

With regard to making a DP editorial board: The proofreaders are not a machine. They will not proof whatever is put in front of them with the same willingness or vigor. I attribute much of the increase in productivity at DP between summer 2003 and the present on the transition from a system that presented projects on a more-or-less first-in, first-out basis, to one that tried to ensure that at least some "easy" English material is present at all times. More recently, this system has been broadened so that (in English and French anyway, our two most popular languages), several genre-based queues attempt to ensure that at least one or two projects in that genre are available for proofing at any time. (This also provides an incentive for content providers to provide material that the proofers enjoy proofing more. Such material releases faster, and most human beings, on some level, are suckers for instant gratification.) If there existed a generally-agreed-upon canon of books that was in some sense more important to get into PG faster than other books, the best non-coercive way I can think of to encourage work on these books is to give them a queue of their own at DP. If proofers enjoyed working on them, it would become a fast queue, and this would encourage content providers to scan books off the list. However, I don't really see the idiosyncracies of what is and is not in PG as a problem. Every library smaller than the major national libraries exhibits the idiosyncracies of its acquisitions department. Every book in PG has this much to recommend it: someone thought it worthwhile enough to go through the effort of putting it there. -- RS

John Hagerson

11 Jan 11 Jan

1:02 a.m.

New subject: PG editorial policy?

One of the content providers within Distributed Proofreaders has found an MLA (I think) list of the top ten books for each of the years just prior to 1923. He has taken it upon himself (he has access to the Library of Congress in DC) to scan and provide each of these books in fiction and non-fiction. This is one person seeing an opportunity and moving on it. No "editorial board" told him that this is what he was "supposed" to do. He just did it. And we all benefit. John Hagerson

Juliet Sutherland

9 Jan 9 Jan

11:50 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

2003 saw the posting of a lot of audio books and also several versions of the Bible in multiple sections. 2004 did not have any of those. In terms of actual new ebooks, 2004 is well ahead of 2003. JulietS Tim Meekins wrote:

...

...
From the last newsletter, I think this is the most telling stat:

338 Average Per Month in 2004 355 Average Per Month in 2003 203 Average Per Month in 2002 103 Average Per Month in 2001

4049 New eBooks in 2004 4164 New eBooks in 2003 2441 New eBooks in 2002 1240 New eBooks in 2001

We've done FEWER books in 2004 than in 2003... At that rate, I don't see how we could be keeping up with Moore's Law. We did pretty good from 2001 to 2003, but we've started to plateau, if not slide back a bit. I'm sure we will see much more growth, but at a steady Moore's Law curve, I'm not so sure.

_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Sebastien Blondeel

10 Jan 10 Jan

10:12 a.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

On Sun, Jan 09, 2005 at 06:50:39PM -0500, Juliet Sutherland wrote:

...

2003 saw the posting of a lot of audio books and also several versions of the Bible in multiple sections. 2004 did not have any of those. In terms of actual new ebooks, 2004 is well ahead of 2003.

Alternative Stat: "Real e-Books" -------------------------------- Does the figure of "actual new ebooks" exist? If not, why not consider create it? Wikipedia too has both an "official" and a "real (> 200 ch) article count: http://en.wikipedia.org/wikistats/EN/TablesWikipediaEN.htm Request For New Stat: Human Resources and Work ---------------------------------------------- Moore Law has little chance to work here: there is no real technological progress going on (PGDP, gutcheck etc. help, but there are far from the ongoing technological progress going on since the 1950's in computers). It all boils down to human resources. Thos only expand exponentially in pyramid schemes. Granted, we still have some room for that... This is the reason why I believe an interesting figure would be the number of volunteers (PGDP active people would be a nice stat) and the amount of work they do (PGDP proofed pages would be a nice stat, not impeded by the PP bottleneck mentioned previously in this thread: if in 2005 PGDP volunteers proof 100 million pages but PGDP PP is still dripping out slowly, the official PG stats won't show what is really going on). Suggestion For Improvements: Work on PG and PGDPs's Home Pages -------------------------------------------------------------- PG decided to go public at the 10,000th e-book. I would like it to be more successful, famous, and have more volunteers. The ebooksgratuits.com webmaster has a very active group of people doing e-books in Word (I'm working on a filter to help them transform that in PG- acceptable formats, such as TXT or XHTML). He thinks a BIG reason why PG and PGDPs are not successful is the fact that the websites are not clear, not sexy, etc. You can have a look at his site or ask him for details to know what he means. I mention that here because I am not sure this issue is know amongs the volunteers (I'm new enough around here, maybe this has already been addressed).

Marcello Perathoner

6:07 p.m.

New subject: Fw: [gweekly] 15, 00th Project Gutenberg eBook Released

Sebastien Blondeel wrote:

...

Suggestion For Improvements: Work on PG and PGDPs's Home Pages --------------------------------------------------------------

... The ebooksgratuits.com webmaster has a very active group of people doing e-books in Word (I'm working on a filter to help them transform that in PG- acceptable formats, such as TXT or XHTML). He thinks a BIG reason why PG and PGDPs are not successful is the fact that the websites are not clear, not sexy, etc.

gutenberg.org ranks 11,132th in the Alexa stats and ebooksgratuits.com ranks 690,225th. gutenberg.org reaches 79 out of a million web users, ebooksgratuits.com reaches 0.75 out of a million web users. See: http://www.alexa.com/data/details/traffic_details?&range=3m&size=large&y=t&url=gutenberg.org http://www.alexa.com/data/details/traffic_details?&range=3m&size=large&y=t&url=ebooksgratuits.com People should at least try to get the facts, before opening their BIG mouths.

...

You can have a look at his site or ask him for details to know what he means.

I think he'd better take a look at our site. -- Marcello Perathoner webmaster@gutenberg.org

Steve Thomas

12:45 a.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

Moore's Law actually has nothing to do with ebooks: "Moore's law is an empirical observation stating, in effect, that at our rate of technological development and advances in the semiconductor industry, the complexity of integrated circuits doubles every 18 months." (From Wikipedia.) Michael Hart likes to compare PG growth with Moore's Law, but nobody should place too much importance on this. The important thing is that PG continues to grow, and that it continues to develop. Steve Miranda van de Heijning wrote:

...

Perhaps I don't understand Moore's Law properly, but aren't we actually well behind on its schedule? If we published our 10,000th book at the end of 2003, doubling in 1.5 years means 20,000 books in mid-2005. At the current rate we are not likely to reach that figure--anywhere between 16k and 17k seems more realistic.

-- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Michael Hart

5:47 p.m.

New subject: Fw: [gweekly] 15, 000th Project Gutenberg eBook Released

On Mon, 10 Jan 2005, Steve Thomas wrote:

...

Moore's Law actually has nothing to do with ebooks: "Moore's law is an empirical observation stating, in effect, that at our rate of technological development and advances in the semiconductor industry, the complexity of integrated circuits doubles every 18 months." (From Wikipedia.)

Michael Hart likes to compare PG growth with Moore's Law, but nobody should place too much importance on this. The important thing is that PG continues to grow, and that it continues to develop.

When doing public relations, it is important to use references the general public is already familiar with. Moore's Law will be recognized by millions as the most popular growth curve and it is something Project Gutenberg has used all along. We also thus get continued recognition from those who know us. Whether Moore's Law is technically correct, etc., is not every person's cup of tea, it's a useful reference that people know. mh

7487

Age (days ago)

7489

Last active (days ago)

List overview

Download

25 comments

13 participants

participants (13)

Gardner Buchanan
Holden McGroin
John Hagerson
Juliet Sutherland
Marcello Perathoner
Michael Dyck
Michael Hart
Miranda van de Heijning
Robert Shimmin
Sebastien Blondeel
steve harris
Steve Thomas
Tim Meekins