
On 02/15/2012 05:37 PM, Carlo Traverso wrote:
"Jana" == Jana Srna<jana.srna@gmail.com> writes:
Jana> Secondly, I have noticed an inconsistency in the page images Jana> for different books. For the books that got them added back Jana> when we (some DP volunteers) uploaded them in bulk, there is Jana> only the xxxxx-page-images directory, but no zip file. Jana> Here's an example: http://www.gutenberg.org/files/21989/
Jana> For the books that Al has posted for me a few days ago Jana> (thanks, Al!), there is both that directory, and a zip of Jana> it. Here's an example: Jana> http://www.gutenberg.org/files/38402/
Jana> Why the difference? Personally, I find the zip file more Jana> valuable than the directory, so would hope that those could Jana> easily be created for the books that don't have them. I do, Jana> however, understand that that would take up more disk space, Jana> so might not be possible.
The individual files are more valuable if one wants to check a possible error, since one does not need to download the full zip file, just to look at one page.
You can get the individual files even if you post the zip only. The web server does all unpacking for you. In fact, for every file you get from gutenberg.org, you get the bits out of the zip file and not out of the uncompressed file that is stored along the zip file. (The server would have to compress those bits for every request, while inside the zip file the compressed bits are up for grabs.) All you need is an index of images to post along the zip file, like we do for audio files, and you save half the disk space. -- Marcello Perathoner webmaster@gutenberg.org