
Jon Noring wrote:
Example: DP0000239-00125-106.png
"DP0000239" is the source book identifier, here a DP identifier. If the scan project is independent of DP, it could be 'PG0014239' to associate the scan set with PG text number 14239.
"00125" says this is the 125th "side" in the full sequence of sides in the book, starting from the front cover or wherever else is considered the starting point.
"106" is the string (which can be more complicated like "A2", "5-4", "ix", "ABCD" whatever), which the publisher printed on that page to identify it (that's really what a publisher-supplied page "number" is: a page identifier.)
Incredibly awkward and broken in several ways: 1. While scanning you have no feedback on the correctitude of your scanning. You are scanning page "42" and saving to file "58.tif". There is no immediate relation between the page you are putting on the scanner and the filename you are saving it under. 2. To add the real page number to the filename you need a second run over all files. Errors galore! Proof: your example filename DP0000239-00125-106.png is bogus: page 125 "starting from the front cover" must be a right-hand side, but page 106 is sure a left-hand one. You got confused even with one file alone. What about handling hundreds of them at once? 3. Being composed of 2 keys, the probability that a link to this file breaks is much higher than using whichever one key. 4.
It will be able to handle *any* bizarre pagination system the publisher/author dreamed up (the publisher could number the pages backwards for all we care, and this system will handle it without any complications -- yet we preserve the publisher-supplied page "number" in the filename which is important for referencing/citation.)
Bogus claim. The publisher might put something in the page "number" that doesn't work as filename or url. What about page "4/2"? Makes a good filename, huh? -- Marcello Perathoner webmaster@gutenberg.org