Re: [gutvol-d] Improving the PG library

23 Sep 2012

      Hi David,

Thanks very much for your response. I'll try to clarify/explain my
thinking on a few things, but please keep picking holes: the absolute
last thing I want to do is start doing a whole lot of work and discover
a fatal flaw down the road. I would much rather not start at all.

On 2012-09-22, David Starner wrote:
...
...
(typos in the original can be corrected directly in the MS if
required).
This is practically a breaking rule for me. "Typos" have been fixed in
books uploaded to PG that weren't typos before. Vandalizing the master
scans, too? It's easy to make mistakes in correcting the original; you
certainly don't want to engrave your mistakes into the original
images.
The MS is not supposed to be the original; it is supposed to be PG's
definitive version. I agree that changes to the MS would have to be very
carefully researched and controlled, and there may well be some benefit
in storing the actual original alongside the MS if any changes of this
nature are made.
...
...
The future benefit is that the MS is a perfect universal master
format:
I don't think that word means what you think it means.
I define "universal master format" as a format capable of encoding any
and all characters and any and all formatting nuances. There are lots of
"master formats" available, each capable of encoding a subset of these
things, but only the image itself is "universal".
...
I don't understand; the RTT misses out on a huge amount of important
information. Italics can have a large impact on meaning at times, for
one.
...
No; to turn the RTT into anything, you'll have to reproofread the
book, to catch italics and the rest of the formatting. Toss a few
sidenotes in there, and besides just catching stuff, you'd spend a lot
of time separating them out from the surrounding text. Two column
material? Trees?
This is true, although I wouldn't use the term "reproofread" as that
implies checking the words and punctuation again (the time-consuming
bit), which is exactly what I want to avoid. The RTT is intended as a
simple foundation. It must be used in conjunction with the MS to turn it
into anything else, and this may indeed involve a lot of work for a
complex book. The point is that if you started with a scan of that
complex book with the intention of producing an ebook using your markup
language of choice, you would likely at some point in the process have
something very similar to the RTT. If someone else then decided that the
book would be better in their markup language of choice they would also
at some stage have something very similar to the RTT, but they would
have had to repeat all the work that you did because they wouldn't have
had access to your RTT.

There is absolutely nothing stopping other layers of information
building upon the MS and RTT: the MS and RTT are simply the core
requirement. Personally, in the production of Kindle PDFs via LaTeX I
will be producing metadata for position of lines in images, positions of
micro-formatting (formatting that needs more than a brief glance to
discern) and end-of-line hyphen mappings. This metadata can be added to
the PG archives in case it is useful to others, but it is not something
that should be mandated, any more than intermediate master formats are
something that should be mandated.
...
...
Additional benefits: both the MS and RTT are usable ebook formats in
and of themselves,
The RTT is not good; you've thrown away important information. The MS
adds nothing to what IA or Google Books offers.
Conceded. The RTT and MS are not intended to be final ebook formats, so
the fact that they can be pressed into service as inferior formats is
not relevant. I shouldn't have brought it up.
...
...
and the combination will allow a pretty nifty
errata system to be written, whereby a reader types in a suspect
phrase, gets taken to the line in the MS where that phrase is found,
and can deliver the errata by simply clicking on the line and clicking
a "Please Check" button.
You can't errata italics or many other important parts of the book.
It's not unuseful, but it's hardly complete.
Errata of formatting could be done with such a system: you would just
need to capture which final format contained the error. That's a really
long way down the road though.

Cheers

Jon

Re: [gutvol-d] Improving the PG library

Jon Hurst