jon said:
> The system I propose for scan file naming
> is *simpler* than yours and more flexible
no, it isn't. not on either count.
the mere act of keeping track of a single name
when it contains two -- or more -- variables in it
becomes much more difficult than it needs to be.
put together a half-dozen books each containing
hundreds of scan-files using your verbose names
and you'll find yourself drowning in the confusion.
your pace would fall to a crawl. try it. you'll see.
d.p. scanners can't afford to work at such a pace.
my filenaming convention has grown out of my
experience through the entire digitizing process.
if it would have needed to be more complicated,
i would have learned that by now and made it so.
you can always make a system more complex.
the smart thing -- which experience teaches --
is to realize when the increment in _cost_ will
return to you a sufficient increment in _benefit_.
and when it will not.
you have taken the simple-and-useful principle that
it is good to know the contents of a file from its name,
and blown it past the point where it is cost-beneficial.
there are all kinds of information that you _could_ put
into the filename, which _might_ be useful at some point.
but if it makes the process of dealing with the filename too
unwieldy, it ain't worth it. the trick is to know when to stop.
> It also integrates better into the QC system.
you don't even have a good idea what a quality-control
system might look like, let alone knowledge of problems
that might crop up with each particular type of system.
you might _think_ you do. but -- as is typical with you --
you don't know what you don't know. your knowledge has
not been tempered by the big face-slap of the real-world.
nor have you programmed the _apps_ that could implement
such a quality-control system. when you get to that stage,
then come back and we can have this discussion again, jon.
> 2) During the next stage where a human being is
> looking at each scan, they append the *actual*
> publisher supplied page number (or string)
> to the filename from (1). No need to add any
> letter prefixes or anything -- they use the *actual*
> string "as it is".
and here's a good illustration of your lack of knowledge
because of an absence of experience, combined with an
ignorance of the kinds of tasks that the machine can do.
(your willingness to _discuss_ an issue is quite admirable,
jon, really. but that alone can only take a person so far.)
in a properly designed system, the human being
should _not_ have to mess with filenames at all,
or only on the rare occasions of mondo weirdness.
that's why i told david and all the other scanners
to keep on doing whatever they are now doing,
because i can deal with their stuff after-the-fact.
(well, there are _some_ things i wish they would do;
but it has nothing to do with the type of useless tripe
you want them to have to deal with, that's for sure.)
specifically, it's easy to write a routine that looks in
the o.c.r. results to find the page-number of the page.
(in general, i don't say something is "easy" unless i've
already done it myself. because i have learned that
many things that seem like they should be easy are not.
i've already written this routine. it was easy to write.)
if you're writing a tool to clean up a scan=set, as i am,
it's pretty much _required_ that you write this routine,
because you need to delete that number from the text.
but before you delete it (as that will be among the last
things that your tool does), you can use it to _rename_
both the o.c.r. file and the scan-file. indeed, i basically
recommend that this file-renaming be one of the _first_
things you do during clean-up, because it will usually be
_so_ much easier to deal with the other clean-up tasks
when the filenames and the page-numbers match up.
of course, what you should have done in the first place
is ensured that the scans were _auto-named_ correctly,
setting the auto-name counter at 1 when scanning page 1,
and then next scanning each numbered page in sequence,
only going back afterwards to scan out-of-sequence pages.
but accidents do happen, so i programmed this routine
that renames your o.c.r. files and scan-files if needed.
assuming you've got a clean set of scans and
the page-numbering doesn't have many anomalies,
it won't take you more than a minute of two to
review the new names and approve a mass change.
so, once again, jon, you've made a mountain out of a molehill,
and then put together a baroque "plan" on how to scale it...
-bowerbird