
Lee Passey wrote:
This is an extremely important point that I feel has been woefully neglected up to this point. Just what are the use case scenarios this solution is intended to solve? You cannot effectively answer the question of 'How?' until you have answered the question of 'Why?' If you can come up with a number of reasonable scenarios as to how these page scans could be used, the naming mechanism almost defines itself. ... Now the inevitable counter to my desire to be forward-looking is the argument "we don't need it right now, so why should we do it now." My response is much more philosopical than it is empirical: If we don't have the time to do it right, how will we find the time to do it over?
It seemed to me, that some people were eager to start posting page images for varying reasons. That may have been a wrong perception, but nonetheless I acted on it. For those people eager to start, I proposed a simple format that works right now, and covers all the usage cases we can think of right now. At the same time it is robust enough to withstand later editing of the page image collections. This format is fairly easy to build, interacts well with the exisiting catalog at the PG web site and will interact well with the forthcoming TEI master format (which I contribute designing). The simplicity of my proposal is apparent: use the same page numbers as in the book. Use a single-character prefix to distinguish between roman numbering of the frontmatter and arabic numbering of the body. You can't get much simpler than that. All alternative "simpler" counter-proposals are in fact more complex because they introduce a "virtual" page numbering (viz. count the pages in order starting from the first one scanned) which is a new artifact not found in the physical book. The mapping from the "virtual" page number to the real one has to be done either thru meta-data embedded in the image or thru records in a database. Both ways are more complex than putting the real page number slap into the filename. Of course my simple format will not accomodate 100 % of the books out there. It will not accomodate volumes with more than 9.999 pages or more than 9 illustrations on one page or volumes with duplicate page numbers or volumes with more than 26 different numbering sequences. But accomodating 99 % right out of the box is clearly good enough for a start. Sometimes you just don't want to do the "Right Thing", because finding out what the Right Thing is would take too long. Sometimes the best approach is to start with the "Good Enough Thing" like Michael did with plain vanilla texts. Some literature: "Completeness - the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must [be] sacrificed whenever implementation simplicity is jeopardized." The Rise of 'Worse is better' by Richard Gabriel http://www.jwz.org/doc/worse-is-better.html "You often don't really understand the problem until after the first time you implement a solution." The Cathedral and the Bazaar by Eric S. Raymond http://www.firstmonday.org/issues/issue3_3/raymond/ "Analysis Paralysis: Striving for perfection and completeness in the analysis phase leads to project gridlock. [...] Design by Committee: Committee designs are overly complex and and lack a common architectural vision." AntiPatterns / William J. Brown et al. / p. 269f / ISBN 0-471-19713-0 -- Marcello Perathoner webmaster@gutenberg.org