
Quoting Ken Arromdee <arromdee@rahul.net>:
On Sun, 1 Jun 2014, James Adcock wrote:
Yes, you can find them. (Even otherwise good archive.org page image books are typically in .jp2 format and cannot be read without a conversion that takes minutes even on a fast machine.)
On my (by now rather slow) machine, it typically takes 15 minutes or more to convert all the jp2 to something ScanTailor can handle. Then another 15 minutes to half an hour to get all settings within ScanTailor to something nice. Then, you could create a cbr file out of the results (Comic Book Rar), which despite its name also works pretty well for normal books, if you want to read images only. I typically process them further to B&W scans suitable for PGDP proofreading.
You gave the solution yourself: have a way to download books with images of various sizes.
I have been looking at this. Basically, the PG site infrastructure is not up to this. What it would require is a way to dynamically decide what size of image is appropriate, and then generate, on the fly, from the highest resolution available, images that meet that size, and then generate the HTML, ePub, or other formats of the text in question, using those images. To do this, will require considerable coding effort (and might put a big strain on the server) Then, most importantly, we need to find a way to submit those high-resolution source images to PG, which will probably open up a can of worms. When I prepare texts, those sometimes include hundreds of illustrations. I currently keep them within the PG limits of about 100k per image, but still sometimes generate uploads of over 50 megs. If I would shift to uploading the highres images, those uploads will grow to one or more gigabytes. In my personal archives I keep the high-resolution cleaned versions of all illustrations. For all books I've submitted to PG, that adds up to about 250-300 gigabyte of images. Resubmitting them all would be a helluva job (and that is just a little over 1 percent of the complete PG collection). For myself, I believe we can gradually relax the rules on the size of images, but until we have the infrastructure available to serve out lower-resolution version, I wouldn't do this in a radical way. We still have to serve people reading epubs on light-weight devices, and limited access to high-bandwidth connections (most of Africa and large parts of Asia). In that context, the current limitations still make a lot of sense. I would love to be able to already submit the hires images with my submissions, as to have them stored in the archive (separate from the 'page-images', which are often medium res B&W images unsuitable for illustrations. Jeroen.