
Marcello wrote:
Jon Noring wrote:
It is common in a DP book scan job to scan the pages at one resolution sufficient for text, then return and redo all the illustrations at a higher resolution. (There can be multiple illustrations per page, and an iluustration can be embedded within text.)
So the page scan filename system has to include this possibility.
Did you actually *read* my RFC before commenting on it? I ask, because if you had read it, you would have noticed this section:
Oops, I apologize for not making it clear, but I was focusing not on your particular RFC (and its purpose), but on page scan filenaming in general (not embedded within DjVu or whatever). I renamed the Subject: header line on most of my messages to reflect this, but not all of them. I have a book, I am scanning it, and I want to apply an appropriate filename to each separate image so I (and others) can keep everything straight. I also want the filename to be machine processible so important page-related info can be machine read at a future time. I am not thinking of any particular application of using the page scans, such as DjVu. Your system for image naming within DjVu is interesting, and certainly there can be a mapping between a system like I propose, and the one to be used strictly within DjVu.
0003857-00035-28
Where '0003857' is the decimal identifier for the source book which was scanned -- 7 digits gives us 10,000,000 books (hexadecimal would be slightly more compact but not as human friendly) -- '00035' is the sequential page as appears in the source book (independent of any page numbering scheme which includes unnumbered blank pages), and "28" is the page number (or 'string') the publisher/author actually printed on the page to identify it.
This is more complicated and less robust than my proposal.
Well, we are sort of comparing apples and oranges. Sorry for not making that clear.
1. You don't need the ebook number because the ebook number will be in the filename of the multi-page djvu.
Of course, once the images are embedded within a DjVu, the source book ID need not be, and probably should not be, part of the page image naming. So you are right here.
2. You don't want the ebook number because at the time of scanning, the ebook number is unknown.
True. However, for my different situation, when a scan set is submitted to some repository, along with the metadata associated with the scan set, the repository may append the source book id that they assign to the front of the filename. Now, if they produce a single DjVu file from the scan set, then they can transform/remap the filenames to something appropriate for that specific purpose.
3. You don't want the sequence number in the filename because it increases the probability that links to the page image break. If you have to insert a page all subsequent files will have to be renamed, and all links to them will break. In my proposal no link will break if you insert or remove pages (except a link to the removed page).
Within DjVu, certainly! You bring up an interesting point, though in that if someone scans a book, and misses a page, then the sequential page scan numbering (not the same as the publisher page numbering) gets messed up. So once it is discovered a page is missing and is scanned, the sequential, integer numbering has to be fixed to "insert" that page. I am thinking, though, that any book scanning project will go through some kind of quality control checking, as well as generating a metadata/catalog record. During this process the scan file name will be finalized. If the page scans are later incorporated into a DjVu, then the filenames before embedding can be mapped into your proposed system.
4. Who wants to know about "unnumbered blank pages"? You are not going to cite a blank page, are you?
Again, I am not thinking specifically of deeplinking into DjVu and trying to maintain stable links, but rather the filenaming scheme for a bunch of page scans. With respect to understanding how a source book is laid out, it is a good idea to know where the blank pages were -- this also aids in knowing if the book scan set is complete. (There are reasons why many official documents add the statement "this page intentionally left blank" on blank pages. Also, it would not surprise me that in rare cases a page which should have been printed, turned out to be unprinted. But this is a different problem.) For deeplinking into a finalized DjVu file, the blank pages can be left out. You are right -- it is unlikely one will ever encounter a reference in one book to a blank page in another book, except maybe as some elaborate joke. Jon