
On 2012-09-23, David Starner wrote:
I don't see any value in that. Scans are a pain, and the only saving grace is that they accurately represent a physical printed edition.
The value in the MS is in its unequivocal nature. It says: "After much discussion amongst knowledgeable people, this, and only this, is what we at PG consider to be a version of this book. Everything else we publish is a derivation of this. If they don't match this, they are in error." Example 1: As a PG customer I find something in an ebook that doesn't look right. I want to contribute. I want to check it, and if it is wrong, report it. Off I go to PG and I find... nothing. If I'm really keen to help, I go to TIA and I find 50 versions, and I have no idea what is the correct version. Oh well... Example 2: We propose redoing "Pride and Prejudice". Without an MS we immediately hit the buffers. What edition is the extant text? What scan was used? What edition should we use? Is there a scan we can use _anywhere_? What is the copyright situation of the 1923 version? Should we go for a first edition or the latest one we can get copyright clearance for? With the MS: throw the MS at the DP OCR pool and off we go.
What's gained by mangling the scans instead of recording typos externally to them?
In example 1, I am in luck. PG has been adding definitive scans to the PG archives and Marcello has done a really cool interface. I note a difference with my version and the scan, so I report it... and the WW writes back saying that it is not actually a difference: it is just a typo that was corrected in the text and the definitive scan is actually the thing that is incorrect. So much for the definitive scan...
As for universal, the image is not necessarily capable of encoding all formatting nuances. ... ... fur (plenty of 20th/21st century examples for babies)...
Ah... so that's what the Kindle touch is about -- fur capability. :-)
Anything that requires you to look at every word on every page, which italics and bold do, is time-consuming. And perhaps as important as italics is superscript; 10<sup>30</sup> changes quite a bit when it's comes back as 1030.
There are things that can help here, but I want to keep my sights for a core format set as low as possible. I would re-iterate that my aim is only to produce a foundation. You can build whatever you like on that foundation, and some sort of optional formatting methodology would likely be one of those things. For the books that I turn into Kindle PDFs for, you will have a micro-formatting highlighting overlay available. If Bowerbird does some books you will have ZML. You may have Don's Canonical Starting Point. You may have TEI, RST, Docbook or LaTeX. If you don't like anything that is available, you will still always have the option to go back to the MS and RTT and do your own thing, even if that takes a little longer. The main point of RTT is that the phrase "let's make (X)HTML/TEI/RST/LaTeX/ZML (delete as appropriate) the PG master format" _will_, quite rightly, start an unproductive flame war. RTT is inferior to all these things by design, so that any of these things can be built upon it. Your choice once you have, say, a ZML version available, might be to base your work on a transformation from ZML, but the RTT will hopefully have saved the person doing the ZML version a heap of work. Cheers Jon