Re: [gutvol-d] Scan file naming -- another comment

22 Jul 2005

      Jon Noring wrote:
...
Example: DP0000239-00125-106.png
"DP0000239" is the source book identifier, here a DP identifier. If
the scan project is independent of DP, it could be 'PG0014239' to
associate the scan set with PG text number 14239.
"00125" says this is the 125th "side" in the full sequence of sides
in the book, starting from the front cover or wherever else is
considered the starting point.
"106" is the string (which can be more complicated like "A2", "5-4",
"ix", "ABCD" whatever), which the publisher printed on that page to
identify it (that's really what a publisher-supplied page "number" is:
a page identifier.)
Incredibly awkward and broken in several ways:

1.

While scanning you have no feedback on the correctitude of your 
scanning. You are scanning page "42" and saving to file "58.tif". There 
is no immediate relation between the page you are putting on the scanner 
and the filename you are saving it under.

2.

To add the real page number to the filename you need a second run over 
all files. Errors galore!

Proof: your example filename DP0000239-00125-106.png is bogus: page 125 
"starting from the front cover" must be a right-hand side, but page 106 
is sure a left-hand one. You got confused even with one file alone. What 
about handling hundreds of them at once?

3.

Being composed of 2 keys, the probability that a link to this file 
breaks is much higher than using whichever one key.

4.
...
It will be able to handle *any* bizarre pagination system
the publisher/author dreamed up (the publisher could number the pages
backwards for all we care, and this system will handle it without any
complications -- yet we preserve the publisher-supplied page "number"
in the filename which is important for referencing/citation.)
Bogus claim.

The publisher might put something in the page "number" that doesn't work 
as filename or url. What about page "4/2"? Makes a good filename, huh?

-- 
Marcello Perathoner
webmaster@gutenberg.org