
Jon Noring wrote:
But your system requires that each image, when it is first saved, needs a human being to eyeball the page, determine the publisher supplied page number (if any; may be implied), and then manually save the page using the publisher number.
Not at all. If you have a sheet feeder just extract inserted illustrations and such out-of-sequence stuff (making a note of the page number on the back). Then feed the whole pile starting with page arabic "1" and go drink a coffee. If you are lucky and the feeder doesn't jam you just needs compare the filename of the last file with the last page number. If they jibe you are done. Manually scan the inserted illustration sheets and name the files according to the noted page number. Repeat with roman pages.
If it is occasionally necessary to escape characters, then so be it. This is done all the time in URLs.
Then show me how you escape these filenames: DP12345-00420-II/2.png DP12346-00017-第十三.png (I hope those chinese characters came thru.) The escaping should work on all known OS including DOS, Windows, Linux, Mac Classic, Mac OS X, Palm etc., should also work as url and should not need renaming when travelling from one OS to another. If you cannot accomplish that, you basically are proposing an archiving system where files will have to be renamed when the OS changes. -- Marcello Perathoner webmaster@gutenberg.org