
On Sat, Oct 30, 2004 at 06:33:03PM -0400, Scott Lawton wrote:
I've taken the liberty of starting a new thread since I think this issue is important.
It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup.
Just a quick note related to this, and my apologies if it turned up in the thread already and I missed it: We're planning to include the scanned page images along with eBooks. In fact, this is part of the intent with the new directory structure for the PG servers (the /1/0/8/0/... structure). We haven't done any (or many, anyway) because we're still trying to figure out how to best name the page files, and how to link them on a page-by-page basis into the (marked up?) eBooks. Jim Tinsley drafted some general guidelines for the image files themselves, but linking them to the eBooks is something we need to figure out still. (BTW, the Million Books project at archive.org uses djvu for this purpose. It's not bad, but I like our intended solution of XML markup much better. Plus, of course, the MBP is mostly working with relatively poor quality proofreading. For PG, the text has taken the main emphasis, not the appearance.) My notion is that the PGTEI and TEI lite solutions I've been reading about in this list will be easily adaptable to including links to specific page image files, so I've not mentioned it until now. But since it's related to your desire for preservation of the actual appearance of the scanned page, I figured I'd type it up now. That accomplished, please continue with your further thoughts - preserving appearance is definitely something that is frequently desired. -- Greg