Re: [gutvol-d] Initial thoughts on a PG/DP scan repository

16 Jul 2005

      Jon Niehof wrote:
...
--- Joshua Hutchinson <joshua@hutchinson.net> wrote:
...
The problem with DjVu is that there are no Windows-based
encoders.
It would probably be possible to make the DjVuLibre encoders
work under Cygwin.
There is a binary package for Cygwin at:

   http://djvulibre.djvuzone.org/

So we are not letting Window$ users down.
...
However:
"...the best encoders (as of today) are owned by LizardTech Inc
and kept proprietary. The smarts in the encoder can make a big
difference in terms of file size and image quality...certain
types of document compressed with LizardTech's commercial
compressors or with the on-line conversion services (such as
Any2DjVu) will end up smaller (and in some cases higher-quality)
than the ones compressed with the DjVuLibre encoders."
But the open source decoder can always decode what the commercial 
encoder generates.

A djvu is really a multi-layered image in which each layer can be 
compressed using a different method. For our purpose (mostly b/w text 
scans) the jb2 compressor would be used. This is what "man djvu" says 
about the open source jb2 compressor:

cjb2(1)

A  DjVuBitonal  command line encoder. This soft-pattern-matching
compressor produces DjVuBitonal images from PBM images.  It  can
encode  images without loss, or introduce small changes in order
to improve the compression ratio.  The lossless encoding mode is
competitive with that of the Lizardtech commercial encoders.
...
The format is also patent-encumbered. The terms are pretty
liberal (although locked in to the GPL) and I wouldn't expect a
Unisys-style bait-and-switch, but it really, really rubs me the
wrong way.
Its the only current format that gives the user basic comfort.

  - You just need to download one file.
  - You don't need to decompress it.
  - You can view it inside your browser just like you would a pdf.
  - It compresses better than anything else at present.

I don't want to unload hundreds of image files on the user. Can you 
imagine reading Ulysses from 1.000 tiff files? Is there even any picture 
gallery software that can handle that gracefully?

The only viable alternative would be pdf. Are you sure pdf is not 
patent-encumbered? Adobe never released any GPLed tools to produce pdfs.

Sadly, current legislation encorages "information highway robbers" to 
waylay the citizen by springing patents on them.
...
And finally, why encourage 600dpi scans and then muck 'em up
with lossy compression?
Even the lossy compression will retain enough detail, as you can see if 
you download my example file. Its not about preserving the Mona Lisa or 
the last complete copy of the Gutenberg Bible, but about scans of 
printed books. We don't care if the 3rd letter of the 25th line on page 
234 has a tiny smear which the compression will lose. If you still can 
read and ocr the text, its good enough for us.

-- 
Marcello Perathoner
webmaster@gutenberg.org