
roger said:
That means that anyone who derives a final version (HTML, epub, kf8 etc. file) from it has to find and handle all the markup like intalics and superscripts on their own. It's not coded in the RTT if it's coming out of P2, since formatting is not present in the proofing stages, even inline formatting. That is a serious shortcoming to me.
it's more than "a serious shortcoming". it's a fatal flaw. anyone who's done a digitization all the way _knows_ that inline formatting is one of the excruciating parts. even the "overall" formatting can be a whole lot of work, but it's generally among the more "fun" parts of the task, which offsets it a bit. but we still have to admit it's work. and some of the finer points (like _proper_paragraphing_) often take lots of time, relatively, and drive you crazy too. so to pretend that a text-file that contains no formatting can be considered as a "master" in any sense of the word is a denial of reality, one that's silly, if not plain ridiculous. and jon will learn this very quickly, just like everyone else. but all you people who know better shouldn't just let him proceed blithely as if this part of his plan was reasonable... because it's not. it's a fatal flaw. -bowerbird p.s. yes, i'm aware that jon said that he'd dialed back the notion of r.t.t. to sidestep all the "religious" wars. that doesn't remove the fact that r.t.t. is unworkable. p.p.s. no, the irony is not lost on me, not lost at all. sometimes people on this listserve make _assertions_ that something is true, when that thing is _not_ true, and has never been true, and likely never _will_ be true. yet other times when there's something that you _know_ to be true, you just sit on your hands and don't say jack. it's as if _truth_ means absolutely _nothing_ to you. weird.

On 2012-09-27, Bowerbird@aol.com wrote:
it's more than "a serious shortcoming".
it's a fatal flaw.
I have never claimed RTT to be a master format. I have repeatedly and specifically denied this. Lets go back in time and pretend I suggested ZML. Oh yeah, you've tried that. Everyone thought you were nuts. You got pretty upset about that. Reading that little sequence of posts was the _reason_ for RTT. OK LaTeX then? You immediately leapt on that as being dumb because no-one at DP uses it, despite the fact I only said that _I_ would use it. TEI, Docbook, RST. No, no, no. HTML then? Horrified faces all round. OK how about if we write reams of documentation declaring that there is only one proper way to write HTML? Been ongoing for a while now. In fact the only master format that has ever got anywhere is the worst of the lot, DP's made up formatting markup, so I guess it'll have to be that. No takers? Declaring _anything_, including, most assuredly, ZML, a master format is a fatal flaw from nearly everyone else's point of view. So I punted. It was and is the only available option. You get a better P3 that doesn't mangle the lines and uses UTF-8. The words and the punctuation will be right. You get something that is easy to diff, and you get something that is easy to build on. It is a subset of _all_ of the above formats because we will never, ever agree about master formats. If _any_ master format is a fatal flaw and _no_ master format is also a fatal flaw, then yes, I guess we have a fatal flaw.

It's too easy in a listserv to think that any objection carries the authority of everyone who doesn't object to it. And any assertion can find someone who will object, sometimes aggressively. I for one am willing to work with any markup that unambiguously identifies the necessary and sufficient set of information about a text to construct a readable text. On Thu, Sep 27, 2012 at 12:28 PM, Jon Hurst <jon.a@hursts.eclipse.co.uk> wrote:
On 2012-09-27, Bowerbird@aol.com wrote:
it's more than "a serious shortcoming".
it's a fatal flaw.
I have never claimed RTT to be a master format. I have repeatedly and specifically denied this.
Lets go back in time and pretend I suggested ZML. Oh yeah, you've tried that. Everyone thought you were nuts. You got pretty upset about that. Reading that little sequence of posts was the _reason_ for RTT. OK LaTeX then? You immediately leapt on that as being dumb because no-one at DP uses it, despite the fact I only said that _I_ would use it. TEI, Docbook, RST. No, no, no. HTML then? Horrified faces all round. OK how about if we write reams of documentation declaring that there is only one proper way to write HTML? Been ongoing for a while now. In fact the only master format that has ever got anywhere is the worst of the lot, DP's made up formatting markup, so I guess it'll have to be that. No takers? Declaring _anything_, including, most assuredly, ZML, a master format is a fatal flaw from nearly everyone else's point of view.
So I punted. It was and is the only available option. You get a better P3 that doesn't mangle the lines and uses UTF-8. The words and the punctuation will be right. You get something that is easy to diff, and you get something that is easy to build on. It is a subset of _all_ of the above formats because we will never, ever agree about master formats. If _any_ master format is a fatal flaw and _no_ master format is also a fatal flaw, then yes, I guess we have a fatal flaw. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

It's interesting to think about all this from a little broader perspective. As best I understand it, Michael Hart established PG originally as a resource for recording a preserving as much literature as possible to preserve it for the future. Future primarily meant decades, centuries, and millenia. One of the historical motivations for doing such a thing is to consider how valuable it is today to have what little remains of the literature of our past. We've all read of the destruction of the library at Alexandria, for instance. And what a small proportion of ancient Greek plays have survived, and then often in fragments from third and fourth hand. Some of our most valued literature is based on texts that come to us written in upper-case only, with punctuation and often vowels omitted. We are undoubtedly still in a state where more literature is irrecoverably lost every day than is preserved. Given all that, I have a lot of sympathy for any scheme which supports anyone to contribute any digitized text in any format, as the alternative is often losing it. (We do seem to spend a lot of time refining the quality of the same texts, which anyway are least susceptible to final disappearance; but that's another issue.) What I think has a chance of some agreement, which we are currently lacking, is an enumeration of the properties of a text that is sufficient to provide a readable, enjoyable, accurate basis for average readers (presumably still the PG target market;) stated in non-technical terms. So as for sufficiency, what Jon has proposed for his RTT has some merit. Then, before information to duplicate the layout, a good readable accurate product could be easier to produce if things were identified by what they are, rather than what they look like. What do I mean by "what they are"? Something like (these are only some examples): 1. Entire text in the proper sequence. 2. Identification of paragraphs. 3. Identification of chapter boundaries. 4. Identification of chapter headings. 5. Enumeration of illustrations, including a graphics file, caption, credits, explanatory keys, and location in text. 6. Enumeration of footnotes, with reference position in the text stream. 7. Tables ditto. 8. Mathematical expressions. 9. Quotations. 10. Correspondence Addressee Salutation Body Closing clause Signature We're currently not collecting most of that, except by inference from the layout; with little consistency and great ambiguity. It all could be identified and marked with less effort than we're putting into what DP requires now. On Thu, Sep 27, 2012 at 12:28 PM, Jon Hurst <jon.a@hursts.eclipse.co.uk> wrote:
On 2012-09-27, Bowerbird@aol.com wrote:
it's more than "a serious shortcoming".
it's a fatal flaw.
I have never claimed RTT to be a master format. I have repeatedly and specifically denied this.
Lets go back in time and pretend I suggested ZML. Oh yeah, you've tried that. Everyone thought you were nuts. You got pretty upset about that. Reading that little sequence of posts was the _reason_ for RTT. OK LaTeX then? You immediately leapt on that as being dumb because no-one at DP uses it, despite the fact I only said that _I_ would use it. TEI, Docbook, RST. No, no, no. HTML then? Horrified faces all round. OK how about if we write reams of documentation declaring that there is only one proper way to write HTML? Been ongoing for a while now. In fact the only master format that has ever got anywhere is the worst of the lot, DP's made up formatting markup, so I guess it'll have to be that. No takers? Declaring _anything_, including, most assuredly, ZML, a master format is a fatal flaw from nearly everyone else's point of view.
So I punted. It was and is the only available option. You get a better P3 that doesn't mangle the lines and uses UTF-8. The words and the punctuation will be right. You get something that is easy to diff, and you get something that is easy to build on. It is a subset of _all_ of the above formats because we will never, ever agree about master formats. If _any_ master format is a fatal flaw and _no_ master format is also a fatal flaw, then yes, I guess we have a fatal flaw. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 9/27/2012 5:55 PM, don kretz wrote:
What I think has a chance of some agreement, which we are currently lacking, is an enumeration of the properties of a text that is sufficient to provide a readable, enjoyable, accurate basis for average readers (presumably still the PG target market;) stated in non-technical terms.
So as for sufficiency, what Jon has proposed for his RTT has some merit.
Then, before information to duplicate the layout, a good readable accurate product could be easier to produce if things were identified by what they are, rather than what they look like.
What do I mean by "what they are"?
Something like (these are only some examples):
1. Entire text in the proper sequence. 2. Identification of paragraphs. 3. Identification of chapter boundaries. 4. Identification of chapter headings. 5. Enumeration of illustrations, including a graphics file, caption, credits, explanatory keys, and location in text. 6. Enumeration of footnotes, with reference position in the text stream. 7. Tables ditto. 8. Mathematical expressions. 9. Quotations. 10. Correspondence Addressee Salutation Body Closing clause Signature
We're currently not collecting most of that, except by inference from the layout; with little consistency and great ambiguity. It all could be identified and marked with less effort than we're putting into what DP requires now.
I can agree with part of that, Don, and with the idea that focusing on page layout may not be as critical. But if you're targeting enjoyment and readability I think you also need the italic/bold/gesperrt markup. Without them you lose emphasis and tone, which will make the reading experience more confusing and less enjoyable, in my opinion. So I think capturing them is at least as important as capturing this other semantic information. Also, location of an illustration in the text is not necessarily relevant in producing an ebook. I've seen a lot of books where the printer put the illustrations in wherever they'd fit, or wherever it was convenient, and often quite removed from where they would make sense to the reader. -- Walt

A discussion of the details is what i would hope for. I would prefer common in-line markup too. As for locations, I think it's helpful to at least know in what sequence to insert the illustrations; what the desirable precision is, or what the producer might do with does with the position information is another step in the process. Don On Thu, Sep 27, 2012 at 3:25 PM, Walt Farrell <walt.farrell@charter.net> wrote:
On 9/27/2012 5:55 PM, don kretz wrote:
What I think has a chance of some agreement, which we are currently lacking, is an enumeration of the properties of a text that is sufficient to provide a readable, enjoyable, accurate basis for average readers (presumably still the PG target market;) stated in non-technical terms.
So as for sufficiency, what Jon has proposed for his RTT has some merit.
Then, before information to duplicate the layout, a good readable accurate product could be easier to produce if things were identified by what they are, rather than what they look like.
What do I mean by "what they are"?
Something like (these are only some examples):
1. Entire text in the proper sequence. 2. Identification of paragraphs. 3. Identification of chapter boundaries. 4. Identification of chapter headings. 5. Enumeration of illustrations, including a graphics file, caption, credits, explanatory keys, and location in text. 6. Enumeration of footnotes, with reference position in the text stream. 7. Tables ditto. 8. Mathematical expressions. 9. Quotations. 10. Correspondence Addressee Salutation Body Closing clause Signature
We're currently not collecting most of that, except by inference from the layout; with little consistency and great ambiguity. It all could be identified and marked with less effort than we're putting into what DP requires now.
I can agree with part of that, Don, and with the idea that focusing on page layout may not be as critical.
But if you're targeting enjoyment and readability I think you also need the italic/bold/gesperrt markup. Without them you lose emphasis and tone, which will make the reading experience more confusing and less enjoyable, in my opinion. So I think capturing them is at least as important as capturing this other semantic information.
Also, location of an illustration in the text is not necessarily relevant in producing an ebook. I've seen a lot of books where the printer put the illustrations in wherever they'd fit, or wherever it was convenient, and often quite removed from where they would make sense to the reader.
-- Walt
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (4)
-
Bowerbird@aol.com
-
don kretz
-
Jon Hurst
-
Walt Farrell