Re: DP output is technically obsolete

kevin said:
On the Open Library System, I note that high resolution gray-scale scans (at least for the one project I checked) are not archived, though the black and white scans are
it's my understanding that d.p. has kept all scans, but it's reasonable they wouldn't mount the high-res ones; no sense letting the general public burn your bandwidth. this, of course, is the problem with high-res files in general. they're nice to have, for purposes of "preservation", but you can't really make them "accessible" in a practical way until computer resources become free across-the-board, so -- in a practical sense -- they don't really do any good. it's not just bandwidth, either. storage problems quickly ensue when each page of a book eats multiple megabytes. and computers need lotsa power to crunch through them. and sure, we can all see the day coming when all of these resources _will_ be available to us. but how soon is that? are you willing to bet on it? and don't forget that you are a lucky first-worlder. how soon until _everyone_ on the whole planet has unlimited computing resources? really? are you willing to bet on it? and if the third-worlders can't have what you lucky people have, how long do you think they will sit on the sidelines without a full-out revolution? we need to think in real-world terms, and be _practical_...
I also note that there is no 'bulk' download function to get a zip of all the files associated with a text.
yeah, that would be nice. will d.p. offer that? who knows? in the meantime, you can learn the address of an image by right-clicking it and choosing the appropriate menu-item. for instance, here's the u.r.l. i recovered for one page:
http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/... subsequent scans have the same u.r.l., except "002.png", "003.png", etc., so it's very easy to scrape them en masse. (if anyone needs a scraper-program, just backchannel me.) -bowerbird

(if anyone needs a scraper-program, just backchannel me.) http://www.gnu.org/software/wget/

This was a special case.. the high resolution scans are actually needed to read/decipher some of the text, but Greg popped up and pointed out that he uploaded the super-duper high res scans to Internet Archive. Which answers the mail on this and satisfies my desire that all those scans of hard to find issues of that work continue to be available. As to a screen scraper, wget, or simply clicking through and downloading each image at OLS, this fails the "Same Barrier to access" test (And I admit it is my standard, not a requirement, or something someone else promised to adhere to) when compared to a PG Text. In order for the scanned pages to be similarly available as the PG Text, the images will need to be available in a single download 'click' the hypothetical generic internet user can understand and make use of. 'One Click, One Book'. Just as a bookstore doesn't make you visit 16 different locations in the store to purchase one book, PG doesn't require you to visit multiple pages to download a book, and Amazon doesn't require you to visit multiple pages (other than order confirmation) to purchase a book. In each of my examples here, Person A can give Person B a link or a location description, and Person B can go to that location and get the book in the preferred format (Paper in hand, paper in the mail, etext of various types, etc). Thanks Kevin On Thu, Apr 22, 2010 at 2:51 PM, <Bowerbird@aol.com> wrote:
kevin said:
On the Open Library System, I note that high resolution gray-scale scans (at least for the one project I checked) are not archived, though the black and white scans are
it's my understanding that d.p. has kept all scans, but it's reasonable they wouldn't mount the high-res ones; no sense letting the general public burn your bandwidth.
this, of course, is the problem with high-res files in general.
they're nice to have, for purposes of "preservation", but you can't really make them "accessible" in a practical way until computer resources become free across-the-board, so -- in a practical sense -- they don't really do any good.
it's not just bandwidth, either. storage problems quickly ensue when each page of a book eats multiple megabytes. and computers need lotsa power to crunch through them.
and sure, we can all see the day coming when all of these resources _will_ be available to us. but how soon is that? are you willing to bet on it? and don't forget that you are a lucky first-worlder. how soon until _everyone_ on the whole planet has unlimited computing resources? really? are you willing to bet on it? and if the third-worlders can't have what you lucky people have, how long do you think they will sit on the sidelines without a full-out revolution?
we need to think in real-world terms, and be _practical_...
I also note that there is no 'bulk' download function to get a zip of all the files associated with a text.
yeah, that would be nice. will d.p. offer that? who knows?
in the meantime, you can learn the address of an image by right-clicking it and choosing the appropriate menu-item.
for instance, here's the u.r.l. i recovered for one page:
http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/...
subsequent scans have the same u.r.l., except "002.png", "003.png", etc., so it's very easy to scrape them en masse. (if anyone needs a scraper-program, just backchannel me.)
-bowerbird
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (3)
-
Bowerbird@aol.com
-
Kevin Pulliam
-
V. L. Simpson