Re: [gutvol-d] Scan file naming -- another comment

23 Jul 2005

      On Saturday 23 July 2005 04:50 am, Marcello Perathoner wrote:
...
Jon Noring wrote:
...
But your system requires that each image, when it is first saved,
needs a human being to eyeball the page, determine the publisher
supplied page number (if any; may be implied), and then manually save
the page using the publisher number.
Gee, much like the upcoming DP metadata rounds, which will perform this 
review, and stor the info in a database.
...
Not at all. If you have a sheet feeder just extract inserted
illustrations and such out-of-sequence stuff (making a note of the page
number on the back). Then feed the whole pile starting with page arabic
"1" and go drink a coffee. If you are lucky and the feeder doesn't jam
you just needs compare the filename of the last file with the last page
number. If they jibe you are done.
Unless you care about the actual physical sequence, which you have just 
ignored.
...
Manually scan the inserted illustration sheets and name the files
according to the noted page number.
Tip-in illustrations generally do not have page numbers.

This discussion has long since become ridiculous.

Scan the pages in physical order, starting from 001.png (.tif, whatever)

Create a metadata file 001.pag for each image which contains the image file 
name or number and the "extra" information, which in this case is really only 
the printed page number, use "none" if page is unnumbered. In XML this would 
be trivial, but if you hate XML (which can be easily (read as a, or loaded 
into a) database) then just write a 001.pag file with (i,1,none) as 
appropriate. If you're really freaked out by unnumbered pages which have a 
'logical' page number (such as the last page of a chapter of fiction, which 
typically are in the sequence, but aren't printed) then have a field for 
printed page number, and logical page number. You'll also frequently see 
front matter done this way, starting at something higher than 'i'.

After preserving the images themselves, preserving the physical sequence is 
the most important requirement of an image archive. The metadata will always 
require human attention, but once it's done (and it can be partially 
automated) later tools to assemble versions in FORMAT_OF_YOUR_CHOICE can take 
that and do with it what they wish.

Trying to store anything but the physical sequence of the pages in the 
filename is an unnecessary complication, and probably short-sighted in the 
long run.

Re: [gutvol-d] Scan file naming -- another comment

D Garcia