
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Does anyone have a list of the EBook numbers which were done by Distributed Proofers? The reason I ask is because it might be kind of nice to feed that list to the new ISO creator to compile a DPDVD. Just a thought. I know it's been talked about, but I haven't heard if it has been done yet. Sincerely Aaron Cannon - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFDJwXHI7J99hVZuJcRAttGAJwMWjMXRLvb08fWGfVzbAawzKPU2QCfUTtJ MgiJlLn+brt6u3DJ5LhCdBU= =ZG0o -----END PGP SIGNATURE-----

Aaron Cannon รก ecrit:
Does anyone have a list of the EBook numbers which were done by Distributed Proofers? The reason I ask is because it might be kind of nice to feed that list to the new ISO creator to compile a DPDVD. Just a thought. I know it's been talked about, but I haven't heard if it has been done yet.
http://www.pgdp.org/~mikeyc21/books_processed_by_dp.txt Fondest regards, Michael

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Well, the complete collection of zip files from DP total 5GB. So, it would be too big for a dvd-5, but might fit on a dvd-9. It should be possible to fit the first 5000 books on a dvd-5 however. Sincerely Aaron Cannon - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFDJ2qcI7J99hVZuJcRApb3AKD5qvZq0A0Dg63k3dZAVmFSfqa3sQCgzRwP MKNCpWbNi+gFQaZdvPpnang= =+xNz -----END PGP SIGNATURE-----

On Tue, Sep 13, 2005 at 07:10:50PM -0500, Aaron Cannon wrote:
Well, the complete collection of zip files from DP total 5GB. So, it would be too big for a dvd-5, but might fit on a dvd-9.
Suggestion: see how much better other compression algorithms do (eg: bzip2) The binaries can be included on the DVD for various systems, they are small in comparison. Samely, for XML books (let's assume in the end all books will be XML'ed), just consider the source XML version (bzipped); XML to other format tools can be included on the DVD too.

On Wed, 14 Sep 2005, Sebastien Blondeel wrote:
On Tue, Sep 13, 2005 at 07:10:50PM -0500, Aaron Cannon wrote:
Well, the complete collection of zip files from DP total 5GB. So, it would be too big for a dvd-5, but might fit on a dvd-9.
Suggestion: see how much better other compression algorithms do (eg: bzip2) The binaries can be included on the DVD for various systems, they are small in comparison.
I have some supercomputer time reserved for testing various compressions, so if anyone wants to send me any DVDs or their images, I can try to see how much we can pack onto a single $1 DVD. Michael

Michael Hart wrote:
I have some supercomputer time reserved for testing various compressions, so if anyone wants to send me any DVDs or their images, I can try to see how much we can pack onto a single $1 DVD.
That's nice but it won't help much. The biggest savings we can get is by: 1. using bzip2 instead of zip 2. compressing more than one file at a time Ad 1. bzip2 is quite well known on linux systems. I don't know how well Windoze supports that. It may be a question of $$$ to Windoze users to get a decompressor. Ad 2. Compressing more than one file at a time makes for a smaller archives because a compressor always starts with a low compression rate and builds itself up along the way. The first KBs of a file have the worst compression rate. Also, the compression rate will drop drastically every time the characteristic of the uncompressed data changes. If you tar a whole lot of text and image files together, you get a better compression rate if you put all txt files first and all image files last instead of interleaving them. Arguably we can get the best compression rate if we tar all files of each file type together (all TXT files, all HTML files etc.) and bzip2 the tar files. But these huge files will be irksome to unpack. Alternatively we could pack the files in chunks of 100 books, carefully ordering the files in the tar, so all TXT files come first, then all HTML files etc. If we want to avoid sensible tools and formats for the sake of Windoze users we can get almost the same results by first zipping a lot of files with compression off and then zipping the zip file with compression on. -- Marcello Perathoner webmaster@gutenberg.org

Marcello, Perhaps compressing the entire disk and using an on-the-fly decryption system (like Knoppix uses) would help. -brandon Marcello Perathoner wrote:
Michael Hart wrote:
I have some supercomputer time reserved for testing various compressions, so if anyone wants to send me any DVDs or their images, I can try to see how much we can pack onto a single $1 DVD.
That's nice but it won't help much.
The biggest savings we can get is by:
1. using bzip2 instead of zip
2. compressing more than one file at a time
Ad 1.
bzip2 is quite well known on linux systems. I don't know how well Windoze supports that. It may be a question of $$$ to Windoze users to get a decompressor.
Ad 2.
Compressing more than one file at a time makes for a smaller archives because a compressor always starts with a low compression rate and builds itself up along the way. The first KBs of a file have the worst compression rate.
Also, the compression rate will drop drastically every time the characteristic of the uncompressed data changes. If you tar a whole lot of text and image files together, you get a better compression rate if you put all txt files first and all image files last instead of interleaving them.
Arguably we can get the best compression rate if we tar all files of each file type together (all TXT files, all HTML files etc.) and bzip2 the tar files. But these huge files will be irksome to unpack.
Alternatively we could pack the files in chunks of 100 books, carefully ordering the files in the tar, so all TXT files come first, then all HTML files etc.
If we want to avoid sensible tools and formats for the sake of Windoze users we can get almost the same results by first zipping a lot of files with compression off and then zipping the zip file with compression on.

On 9/14/05, Marcello Perathoner <marcello@perathoner.de> wrote:
bzip2 is quite well known on linux systems. I don't know how well Windoze supports that. It may be a question of $$$ to Windoze users to get a decompressor.
bzip2 is available for Windows. I think the question is $$$ for a graphical decompressor.
Ad 2.
Compressing more than one file at a time makes for a smaller archives because a compressor always starts with a low compression rate and builds itself up along the way. The first KBs of a file have the worst compression rate.
The max block size for bzip2 is 900kb, so sticking more than 900kb of files together is pointless. Moreso, the bzip2 manual says "Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred k of block size[...]", so even sticking more than 200kb or 300kb of files together may be pointless. I'd really think it more productive to measure the differences, rather than just assume that sticking even the small files together will make a significant difference.

David Starner wrote:
The max block size for bzip2 is 900kb, so sticking more than 900kb of files together is pointless. Moreso, the bzip2 manual says "Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred k of block size[...]", so even sticking more than 200kb or 300kb of files together may be pointless.
Not at all. The compression is low for the first part of *every* block. So the goal is to minimize the number of blocks. Say you have 10 files a 1.0 MB. Compressing them separately you'll have: 1 full block (good compression) 1 nearly empty block (bad compression) for each file totalling: 10 full blocks (good compression) 10 nearly empty blocks (bad compression) If you stick the files together before compressing you'll have one file of 10 MB and: 11 full blocks (good compression) 1 empty block (bad compression) So there is still a difference for files > 900 KB. Of course the real gain is in the small files.
I'd really think it more productive to measure the differences, rather than just assume that sticking even the small files together will make a significant difference.
Right. Do that. -- Marcello Perathoner webmaster@gutenberg.org

"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
Marcello> bzip2 is quite well known on linux systems. I don't know Marcello> how well Windoze supports that. It may be a question of Marcello> $$$ to Windoze users to get a decompressor. bzip2 is freely available for windows: http://gnuwin32.sourceforge.net/packages/bzip2.htm Carlo Traverso

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At 03:58 PM 9/14/2005, you wrote:
bzip2 is freely available for windows:
That's correct. However, I think that the better question to ask would be is it worth the added level of complexity for most users? For most of the people on this list, it is almost certainly no problem at all. However, for many others, it can be. Sincerely Aaron Cannon - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFDKI5zI7J99hVZuJcRAmNPAKChU4memLjWvl10Uj8+ZWPTNoeoEACdE59T BlbUXhChuKaFdPLN2rXDsKs= =VKZL -----END PGP SIGNATURE-----

On Wed, 14 Sep 2005, Carlo Traverso wrote:
"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes: Marcello> bzip2 is quite well known on linux systems. I don't know Marcello> how well Windoze supports that. It may be a question of Marcello> $$$ to Windoze users to get a decompressor.
bzip2 is freely available for windows:
Also, there is a free Windows program called 7-zip which suppors multiple compressions formats, including bzip2. Andrew

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 That's a possibility, but I don't think it really solves the problem. We would just be forcing an arbitrary number of ebooks onto a dvd. Even with a better compression algorithm, we would soon run into the same problem of not enough space. I think it makes more sense to compile a DVD containing the first 5000 books, and another containing the next 5000 or less. Other thoughts or opinions? Sincerely Aaron Cannon At 04:27 AM 9/14/2005, you wrote:
On Tue, Sep 13, 2005 at 07:10:50PM -0500, Aaron Cannon wrote:
Well, the complete collection of zip files from DP total 5GB. So, it would be too big for a dvd-5, but might fit on a dvd-9.
Suggestion: see how much better other compression algorithms do (eg: bzip2) The binaries can be included on the DVD for various systems, they are small in comparison.
Samely, for XML books (let's assume in the end all books will be XML'ed), just consider the source XML version (bzipped); XML to other format tools can be included on the DVD too. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
- -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFDKIbeI7J99hVZuJcRAlFwAJ49wXJgREQyKnKMShtLcw+Wu/UO+ACcDkhk N7FGNK1FdlFMOts+4yfBf/Q= =DqL6 -----END PGP SIGNATURE-----

On Wed, 14 Sep 2005, Aaron Cannon wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
That's a possibility, but I don't think it really solves the problem. We would just be forcing an arbitrary number of ebooks onto a dvd. Even with a better compression algorithm, we would soon run into the same problem of not enough space. I think it makes more sense to compile a DVD containing the first 5000 books, and another containing the next 5000 or less.
Other thoughts or opinions?
Sincerely Aaron Cannon
There are already DVDs out there containing from ~10,000 to ~40,000 eBooks, and this is only using single-sided single-layered $1 blank DVDs. The newer DVD burners will put twice as much on a single DVD, dual-layered, but the media is still going to be expensive until/unless they catch on. However, several new DVD formats are on the way that should make all this a moot point, as the entire Project Gutenberg collection should fit on a single one of the DVDs [other than the Human Genome, etc.]. Personally, I am happy for anyone to put together any eBook collections on DVD and get them out to people any way they can. Again my HUGE thanks!!! Michael
At 04:27 AM 9/14/2005, you wrote:
On Tue, Sep 13, 2005 at 07:10:50PM -0500, Aaron Cannon wrote:
Well, the complete collection of zip files from DP total 5GB. So, it would be too big for a dvd-5, but might fit on a dvd-9.
Suggestion: see how much better other compression algorithms do (eg: bzip2) The binaries can be included on the DVD for various systems, they are small in comparison.
Samely, for XML books (let's assume in the end all books will be XML'ed), just consider the source XML version (bzipped); XML to other format tools can be included on the DVD too. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
- -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers.
iD8DBQFDKIbeI7J99hVZuJcRAlFwAJ49wXJgREQyKnKMShtLcw+Wu/UO+ACcDkhk N7FGNK1FdlFMOts+4yfBf/Q= =DqL6 -----END PGP SIGNATURE----- _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At 12:32 PM 9/15/2005, you wrote:
There are already DVDs out there containing from ~10,000 to ~40,000 eBooks, and this is only using single-sided single-layered $1 blank DVDs.
That is true, however, the reason we can only get 5000 of the DP books onto a DVD is because many of them include pictures and there are multiple versions in HTML, PDF, ETC. So, yes. If we only took plaintext, we could probably squeeze all of the DP stuff onto a couple CDs. Maybe even one. This is actually a good introduction to another topic which I have been mulling over. I'll start a new thread for that though.
The newer DVD burners will put twice as much on a single DVD, dual-layered, but the media is still going to be expensive until/unless they catch on.
However, several new DVD formats are on the way that should make all this a moot point, as the entire Project Gutenberg collection should fit on a single one of the DVDs [other than the Human Genome, etc.].
True, or perhaps flash technology will eventually catch up in capacity and price. Sincerely Aaron Cannon - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFDKb7lI7J99hVZuJcRAsAGAJ4pTzj0H4XufYVWkwzf1FtSAhyTSACgt3Gm g5PkKL25pgEbINZaoAlVj6E= =cj5u -----END PGP SIGNATURE-----

On Tue, Sep 13, 2005 at 12:00:43PM -0500, Aaron Cannon wrote:
Does anyone have a list of the EBook numbers which were done by Distributed Proofers?
A thing I did can be used to answer that question. I try to maintain a list of all French language books from various sources (and I am presently in the process of integrating a much bigger one). We just had the case of a book just published on another source in June and being worked on in PGDP! Of course my programs can be adapted for other (or all) languages. I just got my project accepted under the name "pgdp" on savannah.gnu.org I guess I can publish on the CVS there the code I use and write its documentation for the various tools I am developing around PG and PGDP not sexy website: http://www.eleves.ens.fr/home/blondeel/PGDP/ and for this topic, specifically: http://www.eleves.ens.fr/home/blondeel/PGDP/catalog/
participants (9)
-
Aaron Cannon
-
Andrew Sly
-
Brandon Galbraith
-
Carlo Traverso
-
David Starner
-
Marcello Perathoner
-
Michael Ciesielski
-
Michael Hart
-
Sebastien Blondeel