
On Sun, Sep 23, 2012 at 2:55 AM, Jon Hurst <jon.a@hursts.eclipse.co.uk> wrote:
The MS is not supposed to be the original; it is supposed to be PG's definitive version.
I don't see any value in that. Scans are a pain, and the only saving grace is that they accurately represent a physical printed edition.
I agree that changes to the MS would have to be very carefully researched and controlled, and there may well be some benefit in storing the actual original alongside the MS if any changes of this nature are made.
What's gained by mangling the scans instead of recording typos externally to them?
I define "universal master format" as a format capable of encoding any and all characters and any and all formatting nuances. There are lots of "master formats" available, each capable of encoding a subset of these things, but only the image itself is "universal".
I would define "master format" as one that can be effectively used to derive other formats from. As for universal, the image is not necessarily capable of encoding all formatting nuances. For properties of physical books, scans can't handle transparencies (which I've had in a book I was thinking about scanning), paper changes, metallic inks (one illustration had an EETS book printed for it, but the metallic ink didn't reproduce well in my scans), fur (plenty of 20th/21st century examples for babies), mirrors (ditto), holes (again, ditto), or pop-ups (there's a beautiful 19th century edition of Euclid with them, for example). For properties of the text, it does not show line breaks at tops of pages (ambiguous in poetry) and it obscures spellings at end of lines (is it spelled to-night or tonight?). It's powerful, but not unlimited.
This is true, although I wouldn't use the term "reproofread" as that implies checking the words and punctuation again (the time-consuming bit), which is exactly what I want to avoid.
Anything that requires you to look at every word on every page, which italics and bold do, is time-consuming. And perhaps as important as italics is superscript; 10<sup>30</sup> changes quite a bit when it's comes back as 1030.
The point is that if you started with a scan of that complex book with the intention of producing an ebook using your markup language of choice, you would likely at some point in the process have something very similar to the RTT.
Not necessarily. If I were working on a book, I would format as I went along. Yes, DP has found for their purposes it works better to separate them, but I don't think that's what most people working on a book alone would do. Nor do most people make a line for line copy; without external systems like DP, it's easier to input the text as paragraphs ended by new lines.
If someone else then decided that the book would be better in their markup language of choice they would also at some stage have something very similar to the RTT, but they would have had to repeat all the work that you did because they wouldn't have had access to your RTT.
I don't see why the RTT is the ideal level for that, though. For any large book, I'd rather have the TEI version then RTT--even if you lock me in a cave without Internet and I have to figure out the TEI format by guesswork and write my own XML converters. RTT makes me figure out what I need to rewrap or not and fix all the microformatting, stuff that could be automatically pulled out of even the most stupid HTML. (Okay, so it would be a royal PIA to pull it out of "smart" HTML. Still for a sufficiently large enough work, it'd be worth it.) If I have an RTT version of the text, and an HTML version or TEI version or sufficiently smart structured text version, and I wanted to make a version in TeX or whatever, I'd start with the smart version instead of the RTT. Anything short of Postscript or PDF is going to be better than RTT. -- Kie ekzistas vivo, ekzistas espero.