Re: [for the graphics wizards] Cleaning up original Burton "Kama Sutra" page scans -- need advice/help

From: Jon Noring <jon@noring.name>
for free access. [For those interested, the book is the 1885 second printing of the second edition of Sir Richard F. Burton's "Kama Sutra of Vatsyayana".]
Please do *not* make them bitonal (black and white only)!!!!!!! You could also save time and do only the color manipulation (e.g., "gamma"). Could you place all the original scannings available for me? I could place the scannings available at our site. I also would like to process the images myself before you ruin them. Use of "gamma" most likely is not the way to improve the images. I will pick up the two example images and do experimentations. Takes a few days before I mail again with results. I have earlier done this kind of automatic level blancing in which I additionally flattened the curved pages (dark near the binding).
version for direct reading. For those who will probably ask, the raw page scans have already been uploaded to Distributed Proofreaders for conversion to structured digital text.]
Are those original 8-bit images? How to download them from DP? What are the direct links to the pages? Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open source graphics software

Juhana wrote:
for free access. [For those interested, the book is the 1885 second printing of the second edition of Sir Richard F. Burton's "Kama Sutra of Vatsyayana".]
Please do *not* make them bitonal (black and white only)!!!!!!! You could also save time and do only the color manipulation (e.g., "gamma").
I do not intend to throw away the original page scan images, or the partially cleaned up greyscale "intermediary" images (which have been deskewed, fully cropped, and have been normalized onto a white background canvas.)
Could you place all the original scannings available for me? I could place the scannings available at our site. I also would like to process the images myself before you ruin them.
Again, I'm not throwing away anything. <smile/> But I do understand your view that original page scans should be: 1) done at archival quality and 2) made available to the world.
Use of "gamma" most likely is not the way to improve the images. I will pick up the two example images and do experimentations. Takes a few days before I mail again with results. I have earlier done this kind of automatic level blancing in which I additionally flattened the curved pages (dark near the binding).
I agree with the "gamma". I've been spending a lot of time experimenting with the feedback provided by quite a few people on how to further process these images. A goal of mine is to produce a portable yet nicely readable DjVu version of the scans, and from past experiments going from 600 dpi greyscale to 600 dpi bitonal, and then putting that into DjVu, works out pretty good. The DjVu readers have their own "built-in" anti-aliasing to improve readability. But I'll experiment some more on the 2-bit versus 8-bit approach for DjVu. Regarding "curved" pages, I did mention before the book was "chopped", so each page is now separate and was scanned with minimum distortion.
Are those original 8-bit images?
The originals were scanned at 600 dpi optical and greyscale. Some have mentioned I should have scanned at full 24-bit color since sometimes certain color channels (such as red and green) have lower noise and thus improve image processing/restoration. My experiments yesterday confirmed that the red channel was a little better, but doing various post-clean-up experiments showed that, at least for my copy of the Kama Sutra, the red channel versus grey-scale did not visibly improve the final results. Of course, there may be books where color channel separation could give remarkable improvements in post-processing, so, except for increased space requirements, it may be preferable to scan at 24-bit, even for black and white documents.
How to download them from DP? What are the direct links to the pages?
I'm not sure if I can give out the ftp user:pass at 'dpscans', since it was given to me in confidence. If you are interested, email Jon Niehof <jon_niehof@yahoo.com> for possible access since they were dumped into his folder, 'jnik' (he's on travel, so has limited access the next week or two.) The raw scans total about 1.22 gigs, while the partially cleaned-up scans (still 600 dpi greyscale) take up about 680 megs. Jon Noring

Jon Noring wrote:
I'm not sure if I can give out the ftp user:pass at 'dpscans', since it was given to me in confidence. If you are interested, email Jon Niehof <jon_niehof@yahoo.com> for possible access since they were dumped into his folder, 'jnik' (he's on travel, so has limited access the next week or two.)
Please don't give out the password. The FTP area is to support the loading of image & text files for DP projects only.
The raw scans total about 1.22 gigs, while the partially cleaned-up scans (still 600 dpi greyscale) take up about 680 megs.
I can make the files available on the DP test server via HTTP if someone lets me know what is required. It would be appreciated if such requests are sent to the DP site admins in future. At the moment we don't have the disk space to support files which aren't being loaded into DP projects. Thanks, P -- Help digitise public domain books: Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." Set free dead-tree books: http://bookcrossing.com/referral/servalan

Pauline wrote:
Jon Noring wrote:
I'm not sure if I can give out the ftp user:pass at 'dpscans', since it was given to me in confidence. If you are interested, email Jon Niehof <jon_niehof@yahoo.com> for possible access since they were dumped into his folder, 'jnik' (he's on travel, so has limited access the next week or two.)
Please don't give out the password. The FTP area is to support the loading of image & text files for DP projects only.
Well, I didn't. <smile/> I surmised the user:pass was not intended for public use since a complete search of the pgdp.net site did not bring it up. It just said that the ftp space was for uploading projects, as you noted, and that those wishing to submit stuff need to be given the user:pass from the project leader.
The raw scans total about 1.22 gigs, while the partially cleaned-up scans (still 600 dpi greyscale) take up about 680 megs.
I can make the files available on the DP test server via HTTP if someone lets me know what is required. It would be appreciated if such requests are sent to the DP site admins in future. At the moment we don't have the disk space to support files which aren't being loaded into DP projects.
I'm in the process, as is clear from my various posts on this topic, to produce a "high-quality" and portable cleaned up version in DjVu for general direct reading by the public (a "reader", and not just fodder for OCR.) Also, the final cleaned-up bitonal images (used to produce the DjVu version) will be made available, possibly along with the raw greyscale scans. As Bowerbird requested, they will be uploaded to archive.org, and possibly made available via a ftp site associated with one of my domains (and administered by James Linden). Patience. It's just one book out of several million public domain books. But it certainly is one of the more interesting public domain books, and deserves some effort to clean up the scans for direct use. Jon Noring
participants (3)
-
Jon Noring
-
Juhana Sadeharju
-
Pauline