re: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

norm said:
My success with google pd books is about 30%. and The US is doing very well in providing a large number of useless images online.
see, now _that_ is the shame. _that_ is what the complainers should be complaining about. bad scans do _nobody_ any good. *** jon ingram said:
I've scanned almost a thousand books for Distributed Proofreaders, and the Internet Archive would be a great place to permanently store the images. Every time I've asked them on their website, however, they either haven't replied, or have said that letting outside people contribute material is something that they're planning on setting up, but with no firm date.
see, this is bad too. this needs to be fixed. when you people are willing to do this work, something like _diskspace_ needs to become a solved problem, not a recurring nightmare. so, who can solve this problem for you guys? what could i do to help you guys get it solved? amazon just announced a new storage system. the rates seemed pretty low to me, but i'd guess we're looking for so much space that it'd add up. especially since they charge you for pushing it in. we need some concrete figures to discern pricing, could you give us a ballpark number on that, jon? another alternative would be to store it distributedly. we could chop it up into a thousand pieces and have a network of two thousand people storing it at home. michael keeps telling us how cheap terabyte disks are. maybe we can recreate fidonet with terabytes and d.s.l. but face facts, if we've got a complete scan-set, it has to be saved. it has to. and saved without the waste of even a second thought about it. -bowerbird

jon ingram said:
I've scanned almost a thousand books for Distributed Proofreaders, and the Internet Archive would be a great place to permanently store the images. Every time I've asked them on their website, however, they either haven't replied, or have said that letting outside people contribute material is something that they're planning on setting up, but with no firm date.
Woah there, cowboy. I've been waiting for DP to provide raw page scans for *years*. This is something I discussed with Charles & Juliet years ago. The whitewashers are ready. iBiblio is ready. We have other servers if growth is too fast. Yes, that includes the Internet Archive, where we have several usernames...plus our official backup mirror. I've also been pressing to get preprints from DP...scans before the postprocessing is done, to release "to the wild" before they're quite ready. (Last count there are over 800 of these.) There's even a new preprints section (though this might not be the way we'd to DP preprints) at http://preprints.readingroo.ms If you could help to move things forward on either scans or preprints, I'd be very grateful! (Ditto for anyone else reading.) -- Greg

On Wed, 24 May 2006, Greg Newby wrote:
If you could help to move things forward on either scans or preprints, I'd be very grateful! (Ditto for anyone else reading.)
I don't have everything on DP, but I have personal copies of everything I've ever scanned. What format do you want them in and where do you want them uploaded to? There are a number of other people who would do this also, even if it's not an official DP thing. -- Greg Weeks http://durendal.org:8080/greg/

By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg Newby responded to Jon Ingram with:
Woah there, cowboy.
I've been waiting for DP to provide raw page scans for *years*. This is something I discussed with Charles & Juliet years ago. The whitewashers are ready. iBiblio is ready.
And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible.
I've also been pressing to get preprints from DP...scans before the postprocessing is done, to release "to the wild" before they're quite ready. (Last count there are over 800 of these.)
It's an interesting idea, but initially I'd like to focus on getting the existing projects in order. :)
If you could help to move things forward on either scans or preprints, I'd be very grateful! (Ditto for anyone else reading.) -- Greg
-- David

On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote:
By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg Newby responded to Jon Ingram with:
Woah there, cowboy.
I've been waiting for DP to provide raw page scans for *years*. This is something I discussed with Charles & Juliet years ago. The whitewashers are ready. iBiblio is ready.
And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible.
Responding to Joshua's point about the desired format, as well as Greg W's inquery. There were several messages and some proposals about the details of how to handle page scans. Stuff like whether individual pages should each have their own file, and what format... I will forward a message from Jim Tinsley about that in a moment, from July 2004. There was subsequent discussionn. I don't think we quite got closure, but will ask the WWs if they remember anything specific. My suggestion is to do a few dozen of these, and work out the workflow as we go. If you can upload a .zip or .tar or somesuch to the pglaf server via FTP (not via http://upload.pglaf.org), then email me, I'll push them to the archive. Let me know if you don't have the (non-anonymous) upload/outgoing password for pglaf.org. Ideally, zipped with the eBook #, and with everthing in a page-images, xxxxx-page-images/ subdir: 12345/12345-page-images/ that will allow our automated "push" script to put it in the right place. If things seem to work OK, I'll set things up so I won't need to intervene. I think it's fine to experiment with different ways of doing the images -- that will help us to know what's workable for our readers, and useful for other purposes. Rather than rehashing all of the questions, options and issues, I'd just as soon see some stuff get posted so we can invite folks to try it. (I'm not trying to quell discussion, just trying to avoid the discussion getting in the way of the work.) Thanks for stepping up and trying this! We do want to make images part of the regular workflow, but because the whitewashers tend to download the eBooks to their home/office systems for final processing, we'll probably want to have the page scans flow somewhat separately than everything else. Whoopee, this is great!! Yippee-ei-ayyyyyyyy!! -- Greg
I've also been pressing to get preprints from DP...scans before the postprocessing is done, to release "to the wild" before they're quite ready. (Last count there are over 800 of these.)
It's an interesting idea, but initially I'd like to focus on getting the existing projects in order. :)
If you could help to move things forward on either scans or preprints, I'd be very grateful! (Ditto for anyone else reading.) -- Greg
-- David

On 5/26/06, Greg Newby <gbnewby@pglaf.org> wrote:
My suggestion is to do a few dozen of these, and work out the workflow as we go. If you can upload a .zip or .tar or somesuch to the pglaf server via FTP (not via http://upload.pglaf.org), then email me, I'll push them to the archive. Let me know if you don't have the (non-anonymous) upload/outgoing password for pglaf.org.
Ideally, zipped with the eBook #, and with everthing in a page-images, xxxxx-page-images/ subdir:
12345/12345-page-images/
I'll be happy to upload the page images for several books that I've put through DP onto PG, if you'll give me the upload details. I'll start with some fairly simple 300DPI black-and-white scans, as those are the most common format we've used. Some people have scanned texts at very high resolutions -- 600DPI 24bit colour -- but I don't have the disk space, the internet upload speed, or the patience! -- Jon Ingrma

On 5/26/06, Greg Newby <gbnewby@pglaf.org> wrote:
My suggestion is to do a few dozen of these, and work out the workflow as we go. If you can upload a .zip or .tar or somesuch to the pglaf server via FTP (not via http://upload.pglaf.org), then email me, I'll push them to the archive. Let me know if you don't have the (non-anonymous) upload/outgoing password for pglaf.org.
Ideally, zipped with the eBook #, and with everthing in a page-images, xxxxx-page-images/ subdir:
Assuming we stick with a) Posted, b) Text-only, and c) Non-harvested books, I have several I can contribute (but nowhere near as many as Jon or Juliet). I knew I burned this stuff to DVD for a reason... As with Jon, I'll need login info for the FTP site. R C

On Friday 26 May 2006 02:07 am, Greg Newby wrote:
On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote:
And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible. Whoopee, this is great!! Yippee-ei-ayyyyyyyy!! -- Greg
Er, before you guys get all insane about it (yeah, too late, I know), allow me to clarify that I was speaking only of Charles "OLS" and not of loading page image scans to PG. All DP content providers should be aware that as long as their page images are in the "archive" then DP can coordinate a mass push of ALL eligible page image content to PG. This is contingent on getting the archived DP files in order, a task which I have yet to ascertain the size of. Nice to see that there's interest, though :) -- David

On 5/26/06, D Garcia <donovan@abs.net> wrote:
On Friday 26 May 2006 02:07 am, Greg Newby wrote:
On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote:
And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible. Whoopee, this is great!! Yippee-ei-ayyyyyyyy!! -- Greg
Er, before you guys get all insane about it (yeah, too late, I know), allow me to clarify that I was speaking only of Charles "OLS" and not of loading page image scans to PG.
All DP content providers should be aware that as long as their page images are in the "archive" then DP can coordinate a mass push of ALL eligible page image content to PG. This is contingent on getting the archived DP files in order, a task which I have yet to ascertain the size of.
Right. I figured this was more or less a test run. For one thing, the DP pages # rarely correspond to the physical page #s, and each book will require manual intervention to determine page numbers, short of some kind of re-ocr and automatic page # extraction from the headers. Personally, I'm betting on manual intervention. :) R C

On Fri, May 26, 2006 at 05:30:54PM -0400, D Garcia wrote:
On Friday 26 May 2006 02:07 am, Greg Newby wrote:
On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote:
And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible. Whoopee, this is great!! Yippee-ei-ayyyyyyyy!! -- Greg
Er, before you guys get all insane about it (yeah, too late, I know), allow me to clarify that I was speaking only of Charles "OLS" and not of loading page image scans to PG.
Aha....well, I guess my offer of uploading stuff to the gutenberg.org server for already-completed titles is just for those who don't work through DP (still a substantial portion of new titles). Perhaps also for those DPers who want to try working in advance of the larger push you described. I probably have a record of OLS somewhere in my archives, but hadn't looked recently. Is this public at all? It didn't quite work for me (I got a bunch of blank pages for the title I tried), but it would be quite valuable. I presume that since you didn't provide a URL, I shouldn't either.. When are items added to this?
All DP content providers should be aware that as long as their page images are in the "archive" then DP can coordinate a mass push of ALL eligible page image content to PG. This is contingent on getting the archived DP files in order, a task which I have yet to ascertain the size of.
Yes, that might take a little more coordination to do it en masse. -- Greg
participants (6)
-
Bowerbird@aol.com
-
D Garcia
-
Greg Newby
-
Greg Weeks
-
Jon Ingram
-
Robert Cicconetti