Images +a bit about translations

I recently got some old PD books for use on my tablet. The books included some Jules Verne, some H. R. Haggard, and some Wizard of Oz books. My procedure for getting a book is this: 1) Go to archive.org. 2) Check to see which copies of the book have illustrations. If there are significant illustrations, download the book from there in .jp2 format and run it through a script to convert to .jpg. Then put the zip of .jpg files onto my tablet to read. 3) Only if the above step fails, do I get the book from PG. If the book is by Jules Verne, there's an additional step: check the list at http://jv.gilead.org.il/evans/VerneTrans%28biblio%29.html to see which version is a decent quality translation. It often isn't (and even the relatively good ones have some problems). (Given an October post here about Dracula, perhaps I should research versions of the book even when the book was originally in English.) This ties in to a number of posts on this list about bad translations, as well as to posts I made about images, which got lost from the archives (as well as signups during that time period--I had to sign up to this list again). Basically, for illustrations, illustrations on PG books are completely inadequate. What's worse is that the rules strongly suggest that an uploader use an image size and resolution that may have been sensible 10 years ago but is ludicrously small for a high resolution modern tablet, and doesn't show all the detail in the image. As a start I would suggest fixing the rules, but you'd also need some way to automatically generate a small-images version from a large images version (for those people who actually need small images, like over a slow phone connection at an airport, or who have an older reader with limited capabilities). Big images can be used to make small images, but you can't go the other way around. (What is the practical limit on epub image size?) Previous messages about needing better bibliographical information apply to images too, especially since images are very often changed in later editions of the same book.

Aoson M33 which is an Android tablet with iPad size and resolution (9.7", 2048x1536).
OK, because as you suggest the "page image" approach to reading old books, say from IA or Google, kind of works if one has a tablet or other device which is similar in size and shape to the original book. Whereas the epub or mobi approach is intended to be (and somewhat succeeds) in being a more-or-less device-size independent approach. Not that DP/PG don't often "get it wrong" in terms of actually making epub or mobi device-size independent. I read old books both ways, and find advantages and disadvantages to both approaches. But in general give me a properly implemented epub or mobi book over a "page image" book any day.

On Thu, 29 May 2014, James Adcock wrote:
OK, because as you suggest the "page image" approach to reading old books, say from IA or Google, kind of works if one has a tablet or other device which is similar in size and shape to the original book.
I think you misunderstood me. The point isn't that the "page image" approach is good. The point is that PG's treatment of images is so bad that even the "page image" approach is better. Scanned page images *shouldn't* be the best way to read books (unless they contain a lot of layout). If PG (and e-reader programs, for that matter) handled images decently they would not be. But they are.

Scanned page images *shouldn't* be the best way to read books (unless they contain a lot of layout). If PG (and e-reader programs, for that matter) handled images decently they would not be. But they are.
One can find plenty of awful page image books too -- page images containing images which require a great deal of digital image manipulation to get them to a "readable" [or viewable] state. PG offers images in three sizes right now: 1) zero sized -- no images at all. 2) what the volunteer HTML submitter chose based on their best understanding of what would be best for the PG community -- or perhaps based on what they ate for breakfast that morning. [I know I am reluctant to submit books over 10 meg in size -- due to downloading speed/cost issues for e-book readers on slow cellphone connections.] 3) what the PG html to epub/mobi converter software does to that html file. One could image a downloading system which resizes images "on the fly" based on the target machine. Amazon already does that -- not sure their system has been very successful. But this is starting to sound like the forever-ongoing "big machine / small machine" arguments....

On Sun, 1 Jun 2014, James Adcock wrote:
One can find plenty of awful page image books too -- page images containing images which require a great deal of digital image manipulation to get them to a "readable" [or viewable] state.
Yes, you can find them. (Even otherwise good archive.org page image books are typically in .jp2 format and cannot be read without a conversion that takes minutes even on a fast machine.) And awful as they are, they're still better. At least you can do *something* to be able to read them with good images, even if it is a lot of work. Doing excess work to get something is pretty awful--but it still beats not being able to get it at all.
[I know I am reluctant to submit books over 10 meg in size -- due to downloading speed/cost issues for e-book readers on slow cellphone connections.]
You gave the solution yourself: have a way to download books with images of various sizes. If you submit the book using large images, that can be converted to a book with small images. If you submit a book with small images, that *cannot* be converted to a book with large images, because by only submitting it using small images, you've thrown away information that cannot be recreated. Even if only a portion of the audience is helped by large images, the fact that image size conversion is one way makes it very suboptimal to make them permanently small. Furthermore, although you argue that small images are a necessary tradeoff, that matches up with PG rules only by coincidence. PG merely says that the submitter should use small images, not as part of a tradeoff, but because the rules were written long ago and nobody changed them.

Quoting Ken Arromdee <arromdee@rahul.net>:
On Sun, 1 Jun 2014, James Adcock wrote:
Yes, you can find them. (Even otherwise good archive.org page image books are typically in .jp2 format and cannot be read without a conversion that takes minutes even on a fast machine.)
On my (by now rather slow) machine, it typically takes 15 minutes or more to convert all the jp2 to something ScanTailor can handle. Then another 15 minutes to half an hour to get all settings within ScanTailor to something nice. Then, you could create a cbr file out of the results (Comic Book Rar), which despite its name also works pretty well for normal books, if you want to read images only. I typically process them further to B&W scans suitable for PGDP proofreading.
You gave the solution yourself: have a way to download books with images of various sizes.
I have been looking at this. Basically, the PG site infrastructure is not up to this. What it would require is a way to dynamically decide what size of image is appropriate, and then generate, on the fly, from the highest resolution available, images that meet that size, and then generate the HTML, ePub, or other formats of the text in question, using those images. To do this, will require considerable coding effort (and might put a big strain on the server) Then, most importantly, we need to find a way to submit those high-resolution source images to PG, which will probably open up a can of worms. When I prepare texts, those sometimes include hundreds of illustrations. I currently keep them within the PG limits of about 100k per image, but still sometimes generate uploads of over 50 megs. If I would shift to uploading the highres images, those uploads will grow to one or more gigabytes. In my personal archives I keep the high-resolution cleaned versions of all illustrations. For all books I've submitted to PG, that adds up to about 250-300 gigabyte of images. Resubmitting them all would be a helluva job (and that is just a little over 1 percent of the complete PG collection). For myself, I believe we can gradually relax the rules on the size of images, but until we have the infrastructure available to serve out lower-resolution version, I wouldn't do this in a radical way. We still have to serve people reading epubs on light-weight devices, and limited access to high-bandwidth connections (most of Africa and large parts of Asia). In that context, the current limitations still make a lot of sense. I would love to be able to already submit the hires images with my submissions, as to have them stored in the archive (separate from the 'page-images', which are often medium res B&W images unsuitable for illustrations. Jeroen.

On Mon, 2 Jun 2014, jeroen@bohol.ph wrote:
I have been looking at this. Basically, the PG site infrastructure is not up to this. What it would require is a way to dynamically decide what size of image is appropriate, and then generate, on the fly, from the highest
Just let the user pick small images, large images, or no images (maybe "medium" also). No need to autodetect. And there's no need to convert on the fly. Store one version of each size. If the book has any significant number of images, the low res version will be a fraction of the space used by the high res version, so you may as well store the low res version if you're going to be storing the high res one.

If you submit a book with small images, that *cannot* be converted to a book with large images, ...
On the contrary, if I submit a book with small images, it *can* be converted to a book with large images, because I have already done most of the hard work, which is finding and fixing almost all of the scannos. I think the idea that a book submitted to PG is "once and forever" and cannot be improved-upon by future volunteers, indeed is not even allowed to be improved-upon by future volunteers [!], perhaps using future better markup languages, or better scanners, or better sources found in some library or somewhere on the internet, well I think this "once and forever, no improvements possible" model is very sad and pathetic. What could be useful, and worth thinking about, is a way of caching large images for possible future use or consideration, perhaps as part of the submission process. Or even making the large images available on PG as backing files available to the public, somewhat analogous to the IA "All Files" directory. But, in the current submission process, the design of the "one size fits all" HTML/EPUB/MOBI triplet is by its very nature a requirement by the submitter to compromise. If, for example, HTML and EPUB were allowed to be submitted separately, then perhaps some more intelligent volunteer-implementer tradeoffs could be made, with the HTML targeting larger machines, and the EPUB targeting smaller machines. Please note in my current work process "the larger images" never actually exist, since I am working on images with an understanding of the goal I am trying to achieve, which is small compact images which still, hopefully, maintain somewhat the artistic integrity of the original work the way it was originally intended to be read. And in turn the current image storage formats we have available to us do not even provide good options for doing this, certainly not for woodcuts, somewhat analogous to the problem that HTML does not provide good options to do poetry -- or even reasonable dropcaps. [Or cover pages, or indexes, or ....]

On Mon, 2 Jun 2014, James Adcock wrote:
If you submit a book with small images, that *cannot* be converted to a book with large images, ... On the contrary, if I submit a book with small images, it *can* be converted to a book with large images, because I have already done most of the hard work, which is finding and fixing almost all of the scannos.
"Converting" the submission using extra materials that you have on your hard drive at home but which are not part of the submission is not what I mean by converting it. You can't take just the submission with low resolution, do something to it, and end up with a version with high resolution. Converting to low resolution is a lossy process. You can't convert it back--that's what lossy means.

"Converting" the submission using extra materials that you have on your hard drive at home but which are not part of the submission is not what I mean by converting it.
Again, you assume that when I submit a book I have these yummy high-rez files just sitting there which somehow I am keeping you from having access to. In general, I do not have such high-rez files. In some cases, but not in all cases, I could take the extra time and effort to scan, process, clean, correct, compress, etc. a separate set of high-rez image image files for submission. I probably would be willing to do so, when and if possible, if PG were to set up an intelligent way of actually using those files. Again, I would probably suggest that PG could make the submission of HTML separate from the submission of EPUB files, and the HTML could be high-rez for people who want to read HTML on big machines, and the EPUB could then be appropriately sized-down in all regards including images to make small machines happy -- and modern MOBI would flow more or less directly from the EPUB submissions. If the HTML were submitted separately from the EPUB, rather than the current PG auto-gin approach, then it might be worth the extra effort, because one could submit two high quality not-compromised efforts, rather than having to submit one by-definition compromised effort. [Or at least less-compromised efforts because again any flavor of HTML including EPUB is less than ideal for doing "real" books.]

Hi I've done a lot of the Dutch translations of Jules Verne novels, mostly from my own copies. They typically include all the original illustrations of the original French series, and these I've scanned at 600 DPI. For PG, I've only included jpegs at 720 pixels along the longest edge, to stay within spec. (Some older ones at 512 pixels). I am still planning to upload the high-res scans somewhere. Jeroen.

On 05/28/2014 06:42 AM, Ken Arromdee wrote:
Basically, for illustrations, illustrations on PG books are completely inadequate. What's worse is that the rules strongly suggest that an uploader use an image size and resolution that may have been sensible 10 years ago but is ludicrously small for a high resolution modern tablet, and doesn't show all the detail in the image.
The most detail you can get is the level of detail the printer left on the paper about 100+ years ago, minus the paper degradation. A "sensible" resolution does not depend on the rendering device and so could be established once for ever. We could require producers to provide images in a "sensible" resolution, and rescale them on the server for user consumption, but not with the present hardware budget and not without changing the WWers' workflow. Regards -- Marcello Perathoner webmaster@gutenberg.org
participants (6)
-
arromdee@rahul.net
-
James Adcock
-
Jeroen Hellingman
-
jeroen@bohol.ph
-
Ken Arromdee
-
Marcello Perathoner