Questions on GP and other ebooks

It looks like a lively, lightly-moderated list here. As a long-term participant on Usenet as well as many email lists, this is nothing new to me and I feel perfectly welcome. :) My first question involves the book "On The Sensations of Tone" by Herman Helmholtz. I recently saw mention of it online and decided that since its copyright has long expired I might find it on the GP site. I don't see it, nor did I find the text elsewhere online, but further investigation (reading much of the PG FAQ) led me to this page with list of books in progress: http://www.dprice48.freeserve.co.uk/GutIP.html The relevant portion of that webpage is: Helmholtz, Hermann Ludwig Ferdinand von (31aug1821-8sep1894) The Mystery of Creation - Copyright cleared 23 Nov 1997 On the Sensations of Tone as a Physiological Basis for the Theory of Music - Copyright cleared 17 Sep 2003 How do I find the "real" status of this book, whose copyright was cleared seven months ago but is not yet online? If it's not actively being converted to text by someone else, I'd like to do it (as soon as I get my own physical copy). Apparently, I should email Mr. Price at the address indicated on that webpage to see what he might know, but I'd also like to know if I'm missing something in relation to this. Second question: Why are there two copies of Thomas Paine's "Common Sense" on the GP site? And a third: I just came across yet another copyrighted book online. All such books I've seen on the web are apparently put online legally by the author or with the author's permission. Is there a list of these books, perhaps as a part of an index of "all books online"? And yes, I know this is fairly tangential to the purpose of the GP, as most such authors generally choose to retain full copyright and have their books available ONLY on their sites, so they can have some control, such as using the site to advertise/selling physical copies of the book.

On Sat, Apr 16, 2005 at 11:33:13PM -0400, Ben Bradley wrote:
It looks like a lively, lightly-moderated list here. As a long-term participant on Usenet as well as many email lists, this is nothing new to me and I feel perfectly welcome. :)
As a denizen of Usenet, you may feel even more at home in a while. :-)
My first question involves the book "On The Sensations of Tone" by Herman Helmholtz. I recently saw mention of it online and decided that since its copyright has long expired I might find it on the GP site. I don't see it, nor did I find the text elsewhere online, but further investigation (reading much of the PG FAQ) led me to this page with list of books in progress:
http://www.dprice48.freeserve.co.uk/GutIP.html
The relevant portion of that webpage is:
Helmholtz, Hermann Ludwig Ferdinand von (31aug1821-8sep1894) The Mystery of Creation - Copyright cleared 23 Nov 1997 On the Sensations of Tone as a Physiological Basis for the Theory of Music - Copyright cleared 17 Sep 2003
How do I find the "real" status of this book, whose copyright was cleared seven months ago but is not yet online? If it's not actively being converted to text by someone else, I'd like to do it (as soon as I get my own physical copy). Apparently, I should email Mr. Price at the address indicated on that webpage to see what he might know, but I'd also like to know if I'm missing something in relation to this.
Nope. Not missing anything. Somebody's got it. They may or may not be doing anything with it. It's not queued up at DP. But seven months ain't nuthin' much.
Second question: Why are there two copies of Thomas Paine's "Common Sense" on the GP site?
Same reason there are three Grimms, three Odysseys, three Iliads in English and one in French, seven or eight of Hamlet, two Valley of Fears, two Literary Tastes, and yadda-yadda. FAQ V.32 and R.36. Basically, if it comes from a different paper edition, we post it separately, and give it a new number. And BTW, it's always "PG", not "GP".
And a third: I just came across yet another copyrighted book online. All such books I've seen on the web are apparently put online legally by the author or with the author's permission. Is there a list of these books, perhaps as a part of an index of "all books online"?
Not that I'm aware of. You can quickly get the list of PG-posted copyrighted works (not necessarily titles) from GUTINDEX.ALL, but people put all kinds of stuff online and call it copyrighted but available for private use, and they don't necessarily register it with us or anyone else. jim

Ben Bradley wrote:
And a third: I just came across yet another copyrighted book online. All such books I've seen on the web are apparently put online legally by the author or with the author's permission. Is there a list of these books, perhaps as a part of an index of "all books online"?
The Online Books Page (http://onlinebooks.library.upenn.edu/) lists both public domain and copyrighted books online. I'm sure it doesn't list ALL books, but it will help you find a good many. Curtis.

On Sat, 16 Apr 2005 23:33:13 -0400, you wrote:
http://www.dprice48.freeserve.co.uk/GutIP.html
The relevant portion of that webpage is:
Helmholtz, Hermann Ludwig Ferdinand von (31aug1821-8sep1894) The Mystery of Creation - Copyright cleared 23 Nov 1997 On the Sensations of Tone as a Physiological Basis for the Theory of Music - Copyright cleared 17 Sep 2003
How do I find the "real" status of this book, whose copyright was cleared seven months ago but is not yet online? If it's not actively being converted to text by someone else, I'd like to do it (as soon as I get my own physical copy). Apparently, I should email Mr. Price at the address indicated on that webpage to see what he might know, but I'd also like to know if I'm missing something in relation to this.
Er, "cleared 17 Sep 2003" makes the clearance 19 months old, not 7. Would the book likely contain musical notation? If so, that challenge might account for it's sitting in someone's to-do pile for so long. I'd advise you to go ahead and do it yourself since the clearance is so stale. If you haven't already, you might want to wander over to Distributed Proofreaders (www.pgdp.net)--an inquiry about the Helmholtz book on DP's Content Providers forum might prove useful. -- Janet

Janet Kegg wrote:
Er, "cleared 17 Sep 2003" makes the clearance 19 months old, not 7. Would the book likely contain musical notation? If so, that challenge might account for it's sitting in someone's to-do pile for so long.
I'd advise you to go ahead and do it yourself since the clearance is so stale. If you haven't already, you might want to wander over to Distributed Proofreaders (www.pgdp.net)--an inquiry about the Helmholtz book on DP's Content Providers forum might prove useful.
-- Janet
Also, an e-mail to David will usually result in him double-checking the status with the original person (he has access to that information). They may have disappeared from the face of the earth or just cleared it and then never got around to it... If it is an especially hard work, they may still be working on it. David can help you find out the current situation. Josh

On 4/17/05, Joshua Hutchinson <joshua@hutchinson.net> wrote:
Also, an e-mail to David will usually result in him double-checking the status with the original person (he has access to that information). They may have disappeared from the face of the earth or just cleared it and then never got around to it... If it is an especially hard work, they may still be working on it. David can help you find out the current situation.
It's a good idea in any case. There are not particularly hard books that have got stuck in post-proofing at DP, or elsewhere, that there's no reason to redo the work that's already been done on them.

On Monday 18 April 2005 12:32 am, David Starner wrote:
It's a good idea in any case. There are not particularly hard books that have got stuck in post-proofing at DP, or elsewhere, that there's no reason to redo the work that's already been done on them.
Except that many of those "stuck" books are waiting on missing pages/images, etc. You may find that instead of redoing a book all on your own, you can be the person that provides that one last missing piece to allow the existing but incomplete work to be finished.

On Mon, 18 Apr 2005, D Garcia wrote:
On Monday 18 April 2005 12:32 am, David Starner wrote:
It's a good idea in any case. There are not particularly hard books that have got stuck in post-proofing at DP, or elsewhere, that there's no reason to redo the work that's already been done on them.
Except that many of those "stuck" books are waiting on missing pages/images, etc.
Any reason not to post them with a comment that these pages are missing? Readers would thus be encouraged to help find the missing pages. Michael

On Tue, 19 Apr 2005, Michael Hart wrote:
On Mon, 18 Apr 2005, D Garcia wrote:
On Monday 18 April 2005 12:32 am, David Starner wrote:
It's a good idea in any case. There are not particularly hard books that have got stuck in post-proofing at DP, or elsewhere, that there's no reason to redo the work that's already been done on them.
Except that many of those "stuck" books are waiting on missing pages/images, etc.
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
Or, another possibility, given that many people don't look to closely, would be that someone has a copy of the book, sees that it is already in PG, and then moves on to something else... Andrew

Except that many of those "stuck" books are waiting on missing pages/images, etc.
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
I think it would be a good idea to ask the audience of PG to help looking for missing pages. This would certainly increase the chance of recovering them, and us to post complete books. However, I am strongly opposed to posting incomplete books. If the public can not be sure whether the books they download are complete or not, they will move on to a place where quality can be guaranteed. I think in these kind of issues quality should prevail above quantity. Kind regards, Frank

I think it would be a good idea to ask the audience of PG to help looking for missing pages. This would certainly increase the chance of recovering them, and us to post complete books.
However, I am strongly opposed to posting incomplete books. If the public can not be sure whether the books they download are complete or not, they will move on to a place where quality can be guaranteed. I think in these kind of issues quality should prevail above quantity.
PG already issues books with missing pages, e.g. #11866. However it is stated at the beginning that certain specified pages are missing, so the reader knows what to expect. If the currently best available copy of a text, which may be several hundred years old, is missing a few pages, well that is unfortunate; but surely it is better to give people the chance to read the 99% that is available. Our great museums do not say, this pot has a few chips in it so we will not exhibit it. However, possibly there could be a list of PG works that require pages, so that there would be a higher chance of someone eventually contributing the missing pages. Philip.

On 4/19/05, Phil Hitchcock <phil@hitchcock99.freeserve.co.uk> wrote:
PG already issues books with missing pages, e.g. #11866. However it is stated at the beginning that certain specified pages are missing, so the reader knows what to expect. If the currently best available copy of a text, which may be several hundred years old, is missing a few pages, well that is unfortunate; but surely it is better to give people the chance to read the 99% that is available. Our great museums do not say, this pot has a few chips in it so we will not exhibit it.
The main reason to avoid incomplete projects at DP is a lack of resources, both of skilled people and technical resources. In fact they go together; the Post Processing backlog at DP is causing a chronic shortage of disk space. If a project has to sit on the server for 6 additional months waiting for 2 pages, that is not good. Also, by posting an incomplete work, you add to already heavy PP work load. I've got an incomplete project sitting around waiting on two pages.. but someone has already volunteered to take pictures of the missing pages from the special collection at a nearby university. The existing system is fairly slow, but it does work in many cases. Now if something is extremely rare, and all known copies have the same defect, by all means post it IMO. But otherwise I suggest holding out for a complete work. R C

On Tue, Apr 19, 2005 at 05:12:43PM -0400, Robert Cicconetti wrote:
On 4/19/05, Phil Hitchcock <phil@hitchcock99.freeserve.co.uk> wrote:
PG already issues books with missing pages, e.g. #11866. However it is stated at the beginning that certain specified pages are missing, so the reader knows what to expect. If the currently best available copy of a text, which may be several hundred years old, is missing a few pages, well that is unfortunate; but surely it is better to give people the chance to read the 99% that is available. Our great museums do not say, this pot has a few chips in it so we will not exhibit it.
The main reason to avoid incomplete projects at DP is a lack of resources, both of skilled people and technical resources. In fact they go together; the Post Processing backlog at DP is causing a chronic shortage of disk space. If a project has to sit on the server ...
I'm posting here, in case discussion has stalled or this message didn't get to the right person previously: We're perpetually ready to acquire additional hardware for DP. I can also offer lots of off-site networked storage for backups, "holding" items, etc., etc. There have been numerous short discussions about this, but it sounds like most DP folks are busy doing other things, and haven't had cycles to work on expanding infrastructure. So, in case this helps, I want to reiterate that funding for DP's hardware/network/backups/storage infrastructure is available.
for 6 additional months waiting for 2 pages, that is not good. Also, by posting an incomplete work, you add to already heavy PP work load.
Just a quick note that for posted eBooks such errata/additions can go to the errata list (errata AT pglaf.org). They don't need to go back to the PPer (though in some cases they might need to). The errata team is also overworked, of course... If we do a lot of this, and it involves starting with OCR & proofreading, then I agree it's non-trivial no matter who gets the page scans. But if we can get the scan/page donor to supply proofread text, it's much easier. -- Greg

I think the person responsible for server management at DP is the much overworked Pauline/Pourlean, so I am forwarding the following to her per this reply. On 20 Apr 2005, at 22:17, Greg Newby wrote:
I'm posting here, in case discussion has stalled or this message didn't get to the right person previously: We're perpetually ready to acquire additional hardware for DP.
I can also offer lots of off-site networked storage for backups, "holding" items, etc., etc. There have been numerous short discussions about this, but it sounds like most DP folks are busy doing other things, and haven't had cycles to work on expanding infrastructure. So, in case this helps, I want to reiterate that funding for DP's hardware/network/backups/storage infrastructure is available.
-- branko collin collin@xs4all.nl

Branko Collin wrote:
I think the person responsible for server management at DP is the much overworked Pauline/Pourlean, so I am forwarding the following to her per this reply.
Thanks. I haven't been keeping up with mailing lists at all recently.
On 20 Apr 2005, at 22:17, Greg Newby wrote:
I'm posting here, in case discussion has stalled or this message didn't get to the right person previously: We're perpetually ready to acquire additional hardware for DP.
I can also offer lots of off-site networked storage for backups, "holding" items, etc., etc. There have been numerous short discussions about this, but it sounds like most DP folks are busy doing other things, and haven't had cycles to work on expanding infrastructure. So, in case this helps, I want to reiterate that funding for DP's hardware/network/backups/storage infrastructure is available.
I suspect my last email to Greg went west, so I'll resend privately. As to the issue of extra disk space... I've said a few times on the DP Forums that after developers the thing which DP lacks most is PPers, i.e. the people who take the proofed text & turn it into ebooks. Adding extra disk space will solve the problem in the medium term, but in the end the PP mountain will just grow higher, until we can match the number of posted projects to the number proofed. I am really hoping that the upcoming site upgrade will help with this problem as the extra formatting rounds & open smooth reading pool will hopefully make life much easier for PPers. If you're not a developer, at the moment the best thing you can do for DP is to PP or PPV, so we can get projects posted to PG & into the archive (i.e. off the production server). Here's a more detailed post on how people can help - the numbers have changed since November & we did some recoding of how images are handled to recover some disk space - but essentially the same issue remains: http://www.pgdp.net/phpBB2/viewtopic.php?p=96304#96304 Thanks, P -- Help digitise public domain books: Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." Set free dead-tree books: http://bookcrossing.com/referral/servalan

Pauline wrote:
Here's a more detailed post on how people can help - the numbers have changed since November & we did some recoding of how images are handled to recover some disk space - but essentially the same issue remains: http://www.pgdp.net/phpBB2/viewtopic.php?p=96304#96304
Is there a list of missing pages that you don't have to log in to see? We could put up a link fron the pg web site. Ideally the list should be printable and contain exact edition data plus the last paragraph of the preceding and the first para of the next page. -- Marcello Perathoner webmaster@gutenberg.org

Michael Hart writes:
On Mon, 18 Apr 2005, D Garcia wrote:
On Monday 18 April 2005 12:32 am, David Starner wrote:
It's a good idea in any case. There are not particularly hard books that have got stuck in post-proofing at DP, or elsewhere, that there's no reason to redo the work that's already been done on them.
Except that many of those "stuck" books are waiting on missing pages/images, etc.
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
It's the Distributed Proofreader's policy not to post to PG when there are missing pages. They have a forum listing projects that are missing pages so that DP volunteers can see out the missing pages. Personally, I would not buy books from a publisher with a reputation for knowingly publishing books with pages missing, nor would I want to download from PG if it had a reputation for knowingly publishing etexts that are missing pages.

Bruce Albrecht <bruce@zuhause.org> writes:
Personally, I would not buy books from a publisher with a reputation for knowingly publishing books with pages missing, nor would I want to download from PG if it had a reputation for knowingly publishing etexts that are missing pages.
I bought incomplete books - they were cheap and I was mostly interested in the photographs. As long as incomplete books are properly described and listed, it would be useful to offer them for download. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C

On 4/19/05, Michael Hart <hart@pglaf.org> wrote:
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
If you look at book 13921, you'll notice that it's missing pages 98 and 99 ("[Seiten 98 und 99 fehlen!]", embedded in the middle of the text). How many readers have jumped forward to offer the missing pages? If it had been kept at DP, we could have found the pages and added them. But once it's on the shelf, nobody worries about it anymore. I've found as a general rule, once a book is posted, the odds of anything getting done on it drop vastly. It gets moved to the completed pile, and new books take its place.

On Tue, 19 Apr 2005, David Starner wrote:
On 4/19/05, Michael Hart <hart@pglaf.org> wrote:
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
If you look at book 13921, you'll notice that it's missing pages 98 and 99 ("[Seiten 98 und 99 fehlen!]", embedded in the middle of the text). How many readers have jumped forward to offer the missing pages? If it had been kept at DP, we could have found the pages and added them. But once it's on the shelf, nobody worries about it anymore.
I've found as a general rule, once a book is posted, the odds of anything getting done on it drop vastly. It gets moved to the completed pile, and new books take its place.
Then I suggest we keep some kind of notice for our own people that the book is incomplete, rather than simply ignoring books once they reach the public. It's not as if there is some "Digital Divide" that prevents us from trying to improve our eBooks from both directions. Why use this kind of reasoning to keep these books from seeing the light of day? Michael

If we are going to post incomplete books, the notice should not be just for 'our own people' but should be very clearly stated to the users of PG. Anything less is deceit, in my mind. Such books should go in a section for incomplete projects, and the end-user's help should be specifically solicited, creating a partnership with him, rather than putting him off by trying to pass off a 'broken' project as one in good condition. Melissa On 4/20/05, Michael Hart <hart@pglaf.org> wrote:
Then I suggest we keep some kind of notice for our own people that the book is incomplete, rather than simply ignoring books once they reach the public.
It's not as if there is some "Digital Divide" that prevents us from trying to improve our eBooks from both directions.
Why use this kind of reasoning to keep these books from seeing the light of day?
Michael
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Tue, Apr 19, 2005 at 04:42:56PM -0500, David Starner wrote:
On 4/19/05, Michael Hart <hart@pglaf.org> wrote:
Any reason not to post them with a comment that these pages are missing?
Readers would thus be encouraged to help find the missing pages.
If you look at book 13921, you'll notice that it's missing pages 98 and 99 ("[Seiten 98 und 99 fehlen!]", embedded in the middle of the text). How many readers have jumped forward to offer the missing pages? If it had been kept at DP, we could have found the pages and added them. But once it's on the shelf, nobody worries about it anymore.
I've found as a general rule, once a book is posted, the odds of anything getting done on it drop vastly. It gets moved to the completed pile, and new books take its place.
I feel like I might be stepping on a hornet's nest, so please try to be gentle with me: My questions are two: 1. What is the approximate success rate & timetable for getting missing pages for books in DP? (I.e., how many books are stalled for missing pages, and how many have had their pages found/restored, and how long after proofreading was complete did this happen?) 2. I'm aware there are a sizeable number of books at DP that have completed proofreading, yet are not yet uploaded to the PG servers. What proportion is awaiting missing pages, versus other types of delays. I want to offer two things, also: a) We can run requests in the newsletters for particular items. These go out to > 6K subscribers, and we might get some positive responses. I think Branko Collins was looking to provide some regular DP content to Michael Hart for the newsletter - or, just email stuff to Michael or me. b) ditto for the gutenberg.org Web page: a "wanted" area (with lots of changing content -- drawn from a list of titles missing pages) would probably get a lotta clicks. -- Greg

On 4/21/05, Greg Newby <gbnewby@pglaf.org> wrote:
On 4/19/05, Michael Hart <hart@pglaf.org> wrote: I feel like I might be stepping on a hornet's nest, so please
On Tue, Apr 19, 2005 at 04:42:56PM -0500, David Starner wrote: try to be gentle with me:
My questions are two:
1. What is the approximate success rate & timetable for getting missing pages for books in DP? (I.e., how many books are stalled for missing pages, and how many have had their pages found/restored, and how long after proofreading was complete did this happen?)
Nobody really keeps track. There is a hundred lines of changelog in the forums on the missing pages wiki that could probably provide some information, but not every book goes through there. The second post in that thread is a list of volunteers and the libraries that they have access to; you can PM those people directly, or request the book yourself through ILL, or find a copy on ebay, or.. Worldcat and the library list is a very effective combination.
2. I'm aware there are a sizeable number of books at DP that have completed proofreading, yet are not yet uploaded to the PG servers. What proportion is awaiting missing pages, versus other types of delays.
There are ~25 books listed on the missing pages wiki; several have been claimed by someone. There are 636 books waiting for a post processor to claim them, 1600 claimed for post processing, 126 waiting for verification, and 175 being verified. Note that current policy is that incomplete books should not be uploaded to DP. R C

It is much better if the incomplete projects remain at DP: when pages are found, DP updates both the text and the images, so that, when these will be made available, these will be complete too. However a page at PG with a list of current requests might be very useful. A page with a line for book, with a link to a description of the problem. When fixed, the line will be moved to a different position for a while, with thanks. Of course the page could be useful for non-DP too; and may contain requests for books in PG that need maintenance. Carlo

On Thu, 21 Apr 2005, Carlo Traverso wrote:
It is much better if the incomplete projects remain at DP: when pages are found, DP updates both the text and the images, so that, when these will be made available, these will be complete too.
However a page at PG with a list of current requests might be very useful. A page with a line for book, with a link to a description of the problem. When fixed, the line will be moved to a different position for a while, with thanks.
Of course the page could be useful for non-DP too; and may contain requests for books in PG that need maintenance.
Is there any reason these projects cannot be kept at DP as suggested and also still shared with the world? Michael

On Thursday 21 April 2005 02:47 pm, Michael Hart wrote:
On Thu, 21 Apr 2005, Carlo Traverso wrote:
It is much better if the incomplete projects remain at DP: when pages are found, DP updates both the text and the images, so that, when these will be made available, these will be complete too.
Is there any reason these projects cannot be kept at DP as suggested and also still shared with the world?
Michael
Yes, we like to be thorough. :)

At 06:23 PM 4/21/2005 -0400, you wrote:
On Thursday 21 April 2005 02:47 pm, Michael Hart wrote:
On Thu, 21 Apr 2005, Carlo Traverso wrote:
It is much better if the incomplete projects remain at DP: when pages are found, DP updates both the text and the images, so that, when these will be made available, these will be complete too.
Is there any reason these projects cannot be kept at DP as suggested and also still shared with the world?
Hi. This is only an idea, so if it's not practical, my apologies and please disregard. Why not start a subproject either within PG or DP that could still post the books as long as it is clearly understood that they are not official PG books and have x pages missing? Maybe the PGCC could have such a collection. This way people could still see the books in a transitional statt of completeness while they would not become a part of the PG archive. To take this a step further, they wouldn't be assigned PG ebook numbers and maybe the PGWW people won't even have to be involved, since the books are in a transitional state anyway. I guess it would be similar to the second round of proofreading in DP, the book still has errors, missing pages, etc but is available for all to see. The wanted requests could still be posted to the main PG site and newsletter in hopes that volunteers will find such missing pages more quickly. Since there would still need to be a way to keep track of these substandard books, give them the DP project numbers or no numbers at all. They would still stay within DP, they would just be released earlier with notes that x pages are missing, x more proofing needs to be done, etc. Again, maybe PGCC would be best for this so it's not directly associated with the PG archive.

"Michael" == Michael Hart <hart@pglaf.org> writes:
Michael> On Thu, 21 Apr 2005, Carlo Traverso wrote: >> It is much better if the incomplete projects remain at DP: >> when pages are found, DP updates both the text and the images, >> so that, when these will be made available, these will be >> complete too. >> >> However a page at PG with a list of current requests might be >> very useful. A page with a line for book, with a link to a >> description of the problem. When fixed, the line will be moved >> to a different position for a while, with thanks. >> >> Of course the page could be useful for non-DP too; and may >> contain requests for books in PG that need maintenance. Michael> Is there any reason these projects cannot be kept at DP Michael> as suggested and also still shared with the world? In the forthcoming code release DP will have the so-called "Smooth reading pool", in which books that have passed (most of) the post-processing steps are made available for download for a final reading, identifying the further corrections needed. IIRC, download will be available to non-registered users too, (re-upload for registered users only). While availability is meant for a short period only, it can be used for projects with missing pages for as long as it is needed (until upload at PG). Carlo

On 21 Apr 2005, at 11:47, Michael Hart wrote:
On Thu, 21 Apr 2005, Carlo Traverso wrote:
It is much better if the incomplete projects remain at DP: when pages are found, DP updates both the text and the images, so that, when these will be made available, these will be complete too.
Is there any reason these projects cannot be kept at DP as suggested and also still shared with the world?
No such reason, but I will come to that later. There are philosophical differences between PG and DP that hardly ever come to light except in instances such as now. One is that DP doesn't care how long it takes before a public domain book is presented to the public. This is part of its very make-up; we distribute the work in bits that are as small as possible, and there are very few stakeholders who have a large interest in what finally will happen to the book. If neither the scanner or the post-processor care very much _when_ the book will be released, there is a chance that a text will be sat upon until it's ready, not until it's time. The other difference is that nitpickers are drawn to DP the way moths are drawn to a flame. A lot of volunteers at DP care more about the quality of the works we put out than the quantity. We don't want to produce as many books as possible for as long a time as possible (part of PG's main philosophy), we want to produce good books. Obviously I am exagerating the differences a great deal; I make it sound like PG does not care about quality, and obviously that is not true. I also make it sound that books sit forever at DP, while proofreading monks chip away at the tiniest of imperfections, which is also not true. But the differences that there are may account for why books are apparently sitting longer at DP than PG would like. I can see several solutions for this: - Spring cleaning; the powers that be at DP regularly organize proofreading / post-processing / mentoring / whatever marathons, whenever they feel something needs extra attention. If there are truly books that have been sitting at DP for too long, we can try and organize something like that to flush out the forgotten projects. - Assign quality levels; currently, a PG text is a PG text is a PG text no matter how much effort and attention has gone into it. This means there is a variety in quality that is currently not accounted for. (As a consequence, our bad texts are dragging down our reputation, causing PG's goal to reach out to as many people to miss the mark. Some people won't read our books because of their reputation--see my recent discussion with David Rothman at the Teleread blog.) I can see several disadvantages and several advantages to this proposal. The disadvantages: 1. PG has never liked putting out "editions". I am not sure why. Quality levels are like "editions". 2. Someone has to build it before we can use it. Things can go wrong while we use it. Readers might not understand what each level means. 3. On the PG side, someone has to check (whitewash) a book at every level, not just once. Corrections may have to performed to multiple versions, if we choose to retain versions at older levels. The advantages: 1. We can publish books during several stages of its restoration phase. Currently the following stages would make sense to me: a. After scanning (and perhaps OCR-ing) b. After proofreading/post-processing c. After extended mark-up/proofreading phases (what would be smoothreading at DP) 2. We can keep the process more transparent. 3. Users can choose between quality levels: have an unchecked, incomplete book now, or wait for the improved version. -- branko collin collin@xs4all.nl

1. What is the approximate success rate & timetable for getting missing pages for books in DP? (I.e., how many books are stalled for missing pages, and how many have had their pages found/restored, and how long after proofreading was complete did this happen?)
My experience as a DP projet manager is that if I simply put a page request up for grabs, I might get a bite, and I might not; I feel the half-life of these "passive" requests might be measured in months. If I am more proactive, and look through library catalogs to identify a library that claims to have the book I need a page from, and then ask a fellow DP-er who might have access to that library, I get better results. My experience is that if I ask two or three people, at least one of them will be willing and able to make the scan at their convenience, and they usually find it convenient to do so within a week or three. So, if a person is willing to put the legwork into locating the book, they will probably get results. Not quickly, but not glacially slowly, either.
2. I'm aware there are a sizeable number of books at DP that have completed proofreading, yet are not yet uploaded to the PG servers. What proportion is awaiting missing pages, versus other types of delays.
Who knows? Other issues that hold books up are a small fraction of illegible text, needing to locate a classicist or speaker of a foreign language, and the project being Just Plain Hard (like doing html for a project with over 300 images). -- RS

2. I'm aware there are a sizeable number of books at DP that have completed proofreading, yet are not yet uploaded to the PG servers. What proportion is awaiting missing pages, versus other types of delays.
Who knows? Other issues that hold books up are a small fraction of illegible text, needing to locate a classicist or speaker of a foreign language, and the project being Just Plain Hard (like doing html for a project with over 300 images).
Plus the fact that we are all volunteers on PG, and most of those prefer to do the proofreading, not the Post Processing. I am going as fast as I can and I'm sure that goes for the other PPers. :D JHowse ================================================================================ "I'm not likely to write a great novel or compose a song or save a baby from a burning building...but I can help make sure that there is an electronic library of free knowledge available for future people to access."--jhutch. Preserving History One Page at a Time!! Celebrating our 6600th book posted to Project Gutenberg Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/ ================================================================================

On Thu, 21 Apr 2005, Robert Shimmin wrote:
1. What is the approximate success rate & timetable for getting missing pages for books in DP? (I.e., how many books are stalled for missing pages, and how many have had their pages found/restored, and how long after proofreading was complete did this happen?)
My experience as a DP projet manager is that if I simply put a page request up for grabs, I might get a bite, and I might not; I feel the half-life of these "passive" requests might be measured in months.
If I am more proactive, and look through library catalogs to identify a library that claims to have the book I need a page from, and then ask a fellow DP-er who might have access to that library, I get better results. My experience is that if I ask two or three people, at least one of them will be willing and able to make the scan at their convenience, and they usually find it convenient to do so within a week or three.
So, if a person is willing to put the legwork into locating the book, they will probably get results. Not quickly, but not glacially slowly, either.
If you would send me such requests for inclusion in the Newsletter, that might help. When we put requests for such materials in the Newsletter, we usually get a response within a week about about half the time. This goes up to about 3/4 if we leave the request in for a month. Michael

On Thursday 21 April 2005 02:26 pm, Michael Hart eventually had this to say about missing page requests:
If you would send me such requests for inclusion in the Newsletter, that might help.
Michael
Now that is an excellent suggestion!

On Sunday 17 April 2005 08:16 am, Janet Kegg wrote:
Er, "cleared 17 Sep 2003" makes the clearance 19 months old, not 7. Would the book likely contain musical notation? If so, that challenge might account for it's sitting in someone's to-do pile for so long.
I'd advise you to go ahead and do it yourself since the clearance is so stale. If you haven't already, you might want to wander over to Distributed Proofreaders (www.pgdp.net)--an inquiry about the Helmholtz book on DP's Content Providers forum might prove useful.
I wouldn't call that clearance "stale" exactly .. Juliet Sutherland for example has a huge number of clearances from 2003-ish that will eventually be scanned and processed. The 1997 one though I would definitely proceed on. Your point about those being more difficult or delayed due to musical content is certainly valid, though. Cheers!
participants (23)
-
Andrew Sly
-
Ben Bradley
-
Branko Collin
-
Bruce Albrecht
-
Carlo Traverso
-
Curtis A. Weyant
-
D Garcia
-
David Starner
-
Frank van Drogen
-
Greg Newby
-
Janet Kegg
-
JHowse
-
Jim Tinsley
-
Joshua Hutchinson
-
Karl Eichwalder
-
Marcello Perathoner
-
Melissa Er-Raqabi
-
Michael Hart
-
Pauline
-
Phil Hitchcock
-
Robert Cicconetti
-
Robert Shimmin
-
Tony Baechler