
On Saturday 23 July 2005 05:02 pm, Marcello Perathoner wrote:
Unless you care about the actual physical sequence, which you have just ignored.
Why? I have seen sheet feeders jam, I have seen feeders slurp in two pages at a time but I never seen a sheet feeder reorder the sequence of the pages I fed into it.
Ergo: if I feed the pages in order starting with page 1 and file 1 and end up with page N and file N the files are in the correct physical sequence.
Books almost never start with page 1, even in the arabic numbered sections. Where's your front matter? Back matter? A book which starts at 1 and ends at (1+n) and has no other numbering in it is likely the exception rather than the rule, so it is unrealistic to design a scheme to address only this special case.
Manually scan the inserted illustration sheets and name the files according to the noted page number.
Tip-in illustrations generally do not have page numbers.
That is the reason why they get the page number of the preceding "true" page plus a number as suffix.
Why create this artificial distinction in the first place?
Trying to store anything but the physical sequence of the pages in the filename is an unnecessary complication, and probably short-sighted in the long run.
Why? You don't give any reasons besides personal preference.
I have, but let's review anyway. 1) Store the physical sequence of the pages, front cover to back cover by simply naming the scan files ascending from 1. Advantages: Sort order guaranteed identical across platforms, no parsing of file name segments required to determine information about the file. 2) Create a corresponding metadata file named with a 1:1 correspondence to hold the other (in this case) numbering information about the scanned image. Advantages: Any additional information about the image file is trivially associated, modifiable and extendible. it could be loaded in a database or converted to XML or other formats trivially, making implementation of meaningful searching that much easier. Marcello, in your role at PG, you should realize more than many others that storing data in a file name makes it less accessible to programs which could automate much of the common work of maintaining a dataset. It's not impossible to manipulate, but it is cumbersome and has sort order and case issues across platforms.