
jon said:
The system I propose for scan file naming is *simpler* than yours and more flexible
no, it isn't. not on either count. the mere act of keeping track of a single name when it contains two -- or more -- variables in it becomes much more difficult than it needs to be. put together a half-dozen books each containing hundreds of scan-files using your verbose names and you'll find yourself drowning in the confusion. your pace would fall to a crawl. try it. you'll see. d.p. scanners can't afford to work at such a pace. my filenaming convention has grown out of my experience through the entire digitizing process. if it would have needed to be more complicated, i would have learned that by now and made it so. you can always make a system more complex. the smart thing -- which experience teaches -- is to realize when the increment in _cost_ will return to you a sufficient increment in _benefit_. and when it will not. you have taken the simple-and-useful principle that it is good to know the contents of a file from its name, and blown it past the point where it is cost-beneficial. there are all kinds of information that you _could_ put into the filename, which _might_ be useful at some point. but if it makes the process of dealing with the filename too unwieldy, it ain't worth it. the trick is to know when to stop.
It also integrates better into the QC system.
you don't even have a good idea what a quality-control system might look like, let alone knowledge of problems that might crop up with each particular type of system. you might _think_ you do. but -- as is typical with you -- you don't know what you don't know. your knowledge has not been tempered by the big face-slap of the real-world. nor have you programmed the _apps_ that could implement such a quality-control system. when you get to that stage, then come back and we can have this discussion again, jon.
2) During the next stage where a human being is looking at each scan, they append the *actual* publisher supplied page number (or string) to the filename from (1). No need to add any letter prefixes or anything -- they use the *actual* string "as it is".
and here's a good illustration of your lack of knowledge because of an absence of experience, combined with an ignorance of the kinds of tasks that the machine can do. (your willingness to _discuss_ an issue is quite admirable, jon, really. but that alone can only take a person so far.) in a properly designed system, the human being should _not_ have to mess with filenames at all, or only on the rare occasions of mondo weirdness. that's why i told david and all the other scanners to keep on doing whatever they are now doing, because i can deal with their stuff after-the-fact. (well, there are _some_ things i wish they would do; but it has nothing to do with the type of useless tripe you want them to have to deal with, that's for sure.) specifically, it's easy to write a routine that looks in the o.c.r. results to find the page-number of the page. (in general, i don't say something is "easy" unless i've already done it myself. because i have learned that many things that seem like they should be easy are not. i've already written this routine. it was easy to write.) if you're writing a tool to clean up a scan=set, as i am, it's pretty much _required_ that you write this routine, because you need to delete that number from the text. but before you delete it (as that will be among the last things that your tool does), you can use it to _rename_ both the o.c.r. file and the scan-file. indeed, i basically recommend that this file-renaming be one of the _first_ things you do during clean-up, because it will usually be _so_ much easier to deal with the other clean-up tasks when the filenames and the page-numbers match up. of course, what you should have done in the first place is ensured that the scans were _auto-named_ correctly, setting the auto-name counter at 1 when scanning page 1, and then next scanning each numbered page in sequence, only going back afterwards to scan out-of-sequence pages. but accidents do happen, so i programmed this routine that renames your o.c.r. files and scan-files if needed. assuming you've got a clean set of scans and the page-numbering doesn't have many anomalies, it won't take you more than a minute of two to review the new names and approve a mass change. so, once again, jon, you've made a mountain out of a molehill, and then put together a baroque "plan" on how to scale it... -bowerbird