Re: [gutvol-d] Scan file naming -- another comment

24 Jul 2005

      On Saturday 23 July 2005 05:02 pm, Marcello Perathoner wrote:
...
...
Unless you care about the actual physical sequence, which you have just
ignored.
Why? I have seen sheet feeders jam, I have seen feeders slurp in two
pages at a time but I never seen a sheet feeder reorder the sequence of
the pages I fed into it.
Ergo: if I feed the pages in order starting with page 1 and file 1 and
end up with page N and file N the files are in the correct physical
sequence.
Books almost never start with page 1, even in the arabic numbered sections. 
Where's your front matter? Back matter? A book which starts at 1 and ends at 
(1+n) and has no other numbering in it is likely the exception rather than 
the rule, so it is unrealistic to design a scheme to address only this 
special case.
...
...
...
Manually scan the inserted illustration sheets and name the files
according to the noted page number.
Tip-in illustrations generally do not have page numbers.
That is the reason why they get the page number of the preceding "true"
page plus a number as suffix.
Why create this artificial distinction in the first place?
...
...
Trying to store anything but the physical sequence of the pages in the
filename is an unnecessary complication, and probably short-sighted in
the long run.
Why? You don't give any reasons besides personal preference.
I have, but let's review anyway.

1) Store the physical sequence of the pages, front cover to back cover by 
simply naming the scan files ascending from 1. Advantages: Sort order 
guaranteed identical across platforms, no parsing of file name segments 
required to determine information about the file.

2) Create a corresponding metadata file named with a 1:1 correspondence to 
hold the other (in this case) numbering information about the scanned image. 
Advantages: Any additional information about the image file is trivially 
associated, modifiable and extendible. it could be loaded in a database or 
converted to XML or other formats trivially, making implementation of 
meaningful searching that much easier.

Marcello, in your role at PG, you should realize more than many others that 
storing data in a file name makes it less accessible to programs which could 
automate much of the common work of maintaining a dataset. It's not 
impossible to manipulate, but it is cumbersome and has sort order and case 
issues across platforms.

Re: [gutvol-d] Scan file naming -- another comment

D Garcia