
Bowerbird's thoughts on scanning is a good summary of some of the issues. And despite his view that we've gotten off-track on the discussion, his points about filenaming, image processing (deskewing), etc., align pretty well with the ongoing discussion. Regarding scan filenaming, he rightfully notes that a source book (or Work) identifier be prepended to the filename. This is what I have also proposed. Where we differ in filename convention is that I believe right after the source ID be a sequential number which describes where the page side is in the linear order of all the page sides in the book (totally independent of how the publisher may have paginated the book.) This way one will unambiguously and immediately know the position of every page scan in the bound book (starting with the inside of the front cover, which can be "side 1", and end with the inside of the back cover -- alternatively we can start with the front cover as "side 1" which has some advantages with respect to the dominant recto/verso page numbering convention.) All blank pages will be included. Now, this sequential number will not correlate at all with whatever pagination the publisher uses to 'id' the pages. So, after the sequential number we have a third field in the filename which gives the actual publisher supplied page number (if any; can be implied). This way we decouple the publisher pagination with the page sequence in the book, thereby simplifying the system and making it more flexible. It will be able to handle *any* bizarre pagination system the publisher/author dreamed up (the publisher could number the pages backwards for all we care, and this system will handle it without any complications -- yet we preserve the publisher-supplied page "number" in the filename which is important for referencing/citation.) Example: DP0000239-00125-106.png "DP0000239" is the source book identifier, here a DP identifier. If the scan project is independent of DP, it could be 'PG0014239' to associate the scan set with PG text number 14239. "00125" says this is the 125th "side" in the full sequence of sides in the book, starting from the front cover or wherever else is considered the starting point. "106" is the string (which can be more complicated like "A2", "5-4", "ix", "ABCD" whatever), which the publisher printed on that page to identify it (that's really what a publisher-supplied page "number" is: a page identifier.) (My proposed system has a couple more fields after these three, dealing with exceptions and generation of the scan set from the original, which aid in keeping tracking of multiple derivative scan sets and a few other oddities. The details are described in previous messages.) Jon Noring [Note: In the "My Antonia" project, it is interesting that there is no "Page 1" and "Page 2". The book starts (after the Roman numbered foreword section) with Page 3! Now imagine getting a scan set of "My Antonia" where we have defined page scan sequencing using the page numbering the publisher used (which are the systems proposed by Marcello and Bowerbird.) The first question I will have is "where are pages 1 and 2? Are they missing from the set?" However, if the scans are sequentially numbered based on their position in the book (with the knowledge the book passed QC checking), then I would know that at least the project saw this, too, and likely there were no missing pages, thus it is likely Page 1 and 2 never existed. And for those who will ask, before scanning the book I took it apart to determine if there was a page which got ripped out there, but there definitely was no ripped out page. There might have been an inserted/ glued plate which "fell out", but checking with the "My Antonia" experts there definitely was no such insert in the First Edition. How many other books start pagination of the body with something other than 1?]