
PREFACE PG allows the posting of page images along an existing ebook. The only currently accepted format is tiff files collected in a zip archive. This format is cumbersome for the reader, wasteful on storage space and doesn't allow live deep linking to the page images. SCOPE This RFC specifies an alternative format that is less cumbersome for the reader, less wasteful of storage space and allows live deep linking to the page images. This new format will not replace the old format. Page images for any book can be posted in whichever of the 2 formats the poster chooses. The format specified here will permit online viewing of the page images (with djvu plugin). It is not required to download the file and unpack it as with the old format, although it is still possible to do so. The new format also allows for linking from an html document to an arbitrary page image, so that a click on a link will open the right page image in the djvu browser plugin. The new format compresses text pages 2-3 times better than the old format with no loss of readability or ocr-ability. The new format has GPLed compressors and decompressors. Even if the format should happen to become encumbered by patents or other licensing issues, PG will be able to easily and automatically convert the format into a legally unencumbered format. FORMAT The page images for a book will be posted into the main ebook directory in one multi-page djvu file. The file will be named #####.djvu (replace ##### with the ebook number). The multi-page djvu file contains a collection of single-page djvu files. Each single-page djvu file will contain one single-sided page of the book (cover, back and spine also count as single-sided pages) or an illustration scanned in a different resolution or color depth. Numbering / Naming of page files A book usually contains 2 page number sequences, a roman one followed by an arabic one. We considered the cover pages as yet another sequence. A filename for a single-page djvu file MUST follow this pattern: <prefix><page number>.djvu The prefix for the cover pages is: "c". The prefix for the roman pages is: "f". The prefix for the arabic pages is: "p". If there are more page number sequences in the book, they MUST be handled in a similar fashion, using an arbitrary free letter. The <page number> is the true page number as seen on the physical page (or inferred from the previous / next pages) expressed in arabic numerals and left-padded with zeroes to a length of 4 digits. For blank pages there should be no file and the page number should be skipped. Optionally an image saying: "This page is blank in the original." may be inserted. Missing pages MUST be replaced by an image saying: "This page is missing." A filename for a single-page djvu file containing an illustration scanned in a different resolution or color depth MUST follow this pattern: <prefix><page number>-<image position on the page>.djvu The <image position on the page> is "1" for the first image, "2" for the second, etc. If present, front cover, back cover and spine MUST be named as follows: front cover outside: c0001.djvu front cover inside: c0002.djvu back cover inside: c0003.djvu back cover outside: c0004.djvu spine: c0005.djvu Example of file naming: front cover c0001.djvu back cover c0004.djvu spine c0005.djvu i title page f0001.djvu ii title verso f0002.djvu iii dedication f0003.djvu iv is blank v contents f0005.djvu page 1 p0001.djvu page 2 p0002.djvu image on page 2 p0002-1.djvu image on page 2 p0002-2.djvu page 3 p0003.djvu page 4 is blank page 5 p0005.djvu ... ... page 9999 p9999.djvu Compression To produce the single-page djvu files you should use the most appropriate compressor: lossy jb2 for bitonal text, iw44 for continuous-tone images. Assemblage All single-page djvu files MUST be assembled into a multi-page djvu file. Only the multi-page djvu file will be posted. All "roman" and "arabic" pages MUST appear in the same order in the multi-page file as in the book. All cover pages and the book spine should appear at the front. The naming scheme was chosen so that saying: djvm -c 12345.djvu *djvu in a directory containing all single-page djvu files will assemble the multi-page djvu file in the correct sequence. APPENDIX Open Source compressors and browser plugins for Linux and Windows are available here: http://djvulibre.djvuzone.org/ Windows users also have to get and install Cygwin from here: http://www.cygwin.com/ -- Marcello Perathoner webmaster@gutenberg.org