
Hi David, Thanks very much for your response. I'll try to clarify/explain my thinking on a few things, but please keep picking holes: the absolute last thing I want to do is start doing a whole lot of work and discover a fatal flaw down the road. I would much rather not start at all. On 2012-09-22, David Starner wrote:
(typos in the original can be corrected directly in the MS if required).
This is practically a breaking rule for me. "Typos" have been fixed in books uploaded to PG that weren't typos before. Vandalizing the master scans, too? It's easy to make mistakes in correcting the original; you certainly don't want to engrave your mistakes into the original images.
The MS is not supposed to be the original; it is supposed to be PG's definitive version. I agree that changes to the MS would have to be very carefully researched and controlled, and there may well be some benefit in storing the actual original alongside the MS if any changes of this nature are made.
The future benefit is that the MS is a perfect universal master format:
I don't think that word means what you think it means.
I define "universal master format" as a format capable of encoding any and all characters and any and all formatting nuances. There are lots of "master formats" available, each capable of encoding a subset of these things, but only the image itself is "universal".
I don't understand; the RTT misses out on a huge amount of important information. Italics can have a large impact on meaning at times, for one.
...
No; to turn the RTT into anything, you'll have to reproofread the book, to catch italics and the rest of the formatting. Toss a few sidenotes in there, and besides just catching stuff, you'd spend a lot of time separating them out from the surrounding text. Two column material? Trees?
This is true, although I wouldn't use the term "reproofread" as that implies checking the words and punctuation again (the time-consuming bit), which is exactly what I want to avoid. The RTT is intended as a simple foundation. It must be used in conjunction with the MS to turn it into anything else, and this may indeed involve a lot of work for a complex book. The point is that if you started with a scan of that complex book with the intention of producing an ebook using your markup language of choice, you would likely at some point in the process have something very similar to the RTT. If someone else then decided that the book would be better in their markup language of choice they would also at some stage have something very similar to the RTT, but they would have had to repeat all the work that you did because they wouldn't have had access to your RTT. There is absolutely nothing stopping other layers of information building upon the MS and RTT: the MS and RTT are simply the core requirement. Personally, in the production of Kindle PDFs via LaTeX I will be producing metadata for position of lines in images, positions of micro-formatting (formatting that needs more than a brief glance to discern) and end-of-line hyphen mappings. This metadata can be added to the PG archives in case it is useful to others, but it is not something that should be mandated, any more than intermediate master formats are something that should be mandated.
Additional benefits: both the MS and RTT are usable ebook formats in and of themselves,
The RTT is not good; you've thrown away important information. The MS adds nothing to what IA or Google Books offers.
Conceded. The RTT and MS are not intended to be final ebook formats, so the fact that they can be pressed into service as inferior formats is not relevant. I shouldn't have brought it up.
and the combination will allow a pretty nifty errata system to be written, whereby a reader types in a suspect phrase, gets taken to the line in the MS where that phrase is found, and can deliver the errata by simply clicking on the line and clicking a "Please Check" button.
You can't errata italics or many other important parts of the book. It's not unuseful, but it's hardly complete.
Errata of formatting could be done with such a system: you would just need to capture which final format contained the error. That's a really long way down the road though. Cheers Jon