
On Saturday 23 July 2005 04:50 am, Marcello Perathoner wrote:
Jon Noring wrote:
But your system requires that each image, when it is first saved, needs a human being to eyeball the page, determine the publisher supplied page number (if any; may be implied), and then manually save the page using the publisher number.
Gee, much like the upcoming DP metadata rounds, which will perform this review, and stor the info in a database.
Not at all. If you have a sheet feeder just extract inserted illustrations and such out-of-sequence stuff (making a note of the page number on the back). Then feed the whole pile starting with page arabic "1" and go drink a coffee. If you are lucky and the feeder doesn't jam you just needs compare the filename of the last file with the last page number. If they jibe you are done.
Unless you care about the actual physical sequence, which you have just ignored.
Manually scan the inserted illustration sheets and name the files according to the noted page number.
Tip-in illustrations generally do not have page numbers. This discussion has long since become ridiculous. Scan the pages in physical order, starting from 001.png (.tif, whatever) Create a metadata file 001.pag for each image which contains the image file name or number and the "extra" information, which in this case is really only the printed page number, use "none" if page is unnumbered. In XML this would be trivial, but if you hate XML (which can be easily (read as a, or loaded into a) database) then just write a 001.pag file with (i,1,none) as appropriate. If you're really freaked out by unnumbered pages which have a 'logical' page number (such as the last page of a chapter of fiction, which typically are in the sequence, but aren't printed) then have a field for printed page number, and logical page number. You'll also frequently see front matter done this way, starting at something higher than 'i'. After preserving the images themselves, preserving the physical sequence is the most important requirement of an image archive. The metadata will always require human attention, but once it's done (and it can be partially automated) later tools to assemble versions in FORMAT_OF_YOUR_CHOICE can take that and do with it what they wish. Trying to store anything but the physical sequence of the pages in the filename is an unnecessary complication, and probably short-sighted in the long run.