
David Starner wrote:
The max block size for bzip2 is 900kb, so sticking more than 900kb of files together is pointless. Moreso, the bzip2 manual says "Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred k of block size[...]", so even sticking more than 200kb or 300kb of files together may be pointless.
Not at all. The compression is low for the first part of *every* block. So the goal is to minimize the number of blocks. Say you have 10 files a 1.0 MB. Compressing them separately you'll have: 1 full block (good compression) 1 nearly empty block (bad compression) for each file totalling: 10 full blocks (good compression) 10 nearly empty blocks (bad compression) If you stick the files together before compressing you'll have one file of 10 MB and: 11 full blocks (good compression) 1 empty block (bad compression) So there is still a difference for files > 900 KB. Of course the real gain is in the small files.
I'd really think it more productive to measure the differences, rather than just assume that sticking even the small files together will make a significant difference.
Right. Do that. -- Marcello Perathoner webmaster@gutenberg.org