Re: [gutvol-d] Basic simple test case.

Why? To discuss as an illustration of proper handling of the issues you list, and others; and to see what alternative markup schemes look like in action. Sent from my Phone ------------------------------ From: James Adcock Sent: 10/10/2012 4:53 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Basic simple test case. Re 14668: Well, the first question would be: Why? Contrary to the idea that PG needs to scale up efforts 10X and “do everything” maybe the right answer is to scale DOWN things by 10X and fix the books that people actually want to read, but which are currently hopelessly gone moldy, rather than offer more kiddie readers? Secondly, one needs to get page scans, which are at least available from Google in a variety of editions, you’d have to pick one. In terms of the current “automagic” HTML conversion from txt, this txt shows the problem that PG isn’t even currently “correctly” specifying that similar <p> formatting be used on each device. Seems given the PG txt conventions, PG should be specifying “no indent, 1em of white space between paragraphs” for the <p> styling – so at least the basics match the txt styling. This is important because txt “formatters” implicitly are using the txt formatting rules as an element of the formatting – i.e. syntax vs. semantics **cannot** be uniquely determined automagically by examining a PG txt file, so the best one can hope to do is to emulate the PG txt layout. In terms of hand-recoding the html/epub/mobi there appears to be no great problems other than understanding and dealing with the issue of merged/rounded top/bottom margins or not, which can be dealt with in the standard manner of using top margins only. In terms of design issues, there appears to be minor issues of poetry – not hard since the poetry lines are short. (how to “correctly” autowrap lines of poetry remains problematic in html since html doesn’t support poetry) There are issues of quasi-table listings of words, where the traditional solution is simply to linearize the lists. IE these word lists were “packed” on paper to save paper, but on ebook devices vertical landscape is “free” [horizontal landscape however definitely **is not**] so the word lists can simply be “unpacked.” And there appears to be a minor issue of plain rules vs. decorative rules. * * But all this would still beg the first question: Why? Who is the customer? What parent would want this for their kid today? Seriously? Is some researcher interested in this for historical reasons? Well – frankly they would be better off examining the bitmap scans. Fundamentally, one can’t code anything reasonable unless you decide who the customer is, and how they are going to actually be using your efforts.

OK, but this is a simple book made very complicated by the original author. Most of the problems PG is having are very simple books that are still not being implemented correctly by PG. My suggestion is that we try to get the simple problems solved, not try to come up with an uber-grandiose "solve everything in the world" semantic markup language. I would hope sensible people would look at this particular book and sensibly conclude: "Not worth the effort."

On Wed, Oct 10, 2012 at 10:25 AM, James Adcock <jimad@msn.com> wrote:
I would hope sensible people would look at this particular book and sensibly conclude: “Not worth the effort.”
Are you trolling us? Seriously. Someone just told you these are popular books still in print, and you're saying that if we're sensible, we'll decide they aren't worth the effort. You're starting from a position that most of us agree with, that our ebooks should be usable, and doing your best to alienate everyone. I have done some awesome books that are rather challenging; why do you think that telling me I'm not sensible for doing books that interest me will convince me to do books you think I should do? -- Kie ekzistas vivo, ekzistas espero.

On 10/10/2012 5:45 PM, David Starner wrote:
you're saying that if we're sensible, we'll decide they aren't worth the effort.
Wow, I've got to say that I'm liking the McGuffy readers more and more. This one test case is short (160 pp) yet has extremely complex unicode characters, lists, tables, illustrations, navigation, poems, handwriting, correspondence and even a dialog between actors like in a play (HTML 4 recommends using <dl><dt><dd> for dialogs). I'll bet that when I complete the project I will have encountered just about everything found in any e-book. If BB could take this book and mark it up in s.m.l. in such a way that it would convert to satisfactory .epub and .mobi I'd feel much more inclined to accept his vanity markup language as a real contender. ReST advocates should take the same challenge.

On 10/11/2012 05:52 AM, Lee Passey wrote:
ReST advocates should take the same challenge.
You propose a task that nobody could possibly be interested in so that you can proclaim yourself victor? Nope. Not interested. At least pick a somehow representative book, that people have shown some interest in. (Hint: use the top 10 list). Regards -- Marcello Perathoner webmaster@gutenberg.org

On 10/11/2012 2:55 PM, Marcello Perathoner wrote:
On 10/11/2012 05:52 AM, Lee Passey wrote:
ReST advocates should take the same challenge.
You propose a task that nobody could possibly be interested in so that you can proclaim yourself victor?
Nope. Not interested.
Yeah, I couldn't figure out a way to do it either.

On 10/16/2012 06:13 PM, Lee Passey wrote:
On 10/11/2012 2:55 PM, Marcello Perathoner wrote:
On 10/11/2012 05:52 AM, Lee Passey wrote:
ReST advocates should take the same challenge.
You propose a task that nobody could possibly be interested in so that you can proclaim yourself victor?
Nope. Not interested.
Yeah, I couldn't figure out a way to do it either.
Yawn! I've seen better trolls on this list. How about doing the top 10 books aka. books that people really want to read? -- Marcello Perathoner webmaster@gutenberg.org

On 10/16/2012 10:54 AM, Marcello Perathoner wrote:
On 10/16/2012 06:13 PM, Lee Passey wrote:
On 10/11/2012 2:55 PM, Marcello Perathoner wrote:
On 10/11/2012 05:52 AM, Lee Passey wrote:
ReST advocates should take the same challenge.
You propose a task that nobody could possibly be interested in so that you can proclaim yourself victor?
Nope. Not interested.
Yeah, I couldn't figure out a way to do it either.
Yawn! I've seen better trolls on this list.
How about doing the top 10 books aka. books that people really want to read?
Because those are all too easy. I discovered an e-text that uses enough different textual structures that it may qualify as a good overall test case for any markup language. If ReST can accommodate all the requirements of 14668 then I would say it is a reasonably powerful markup language. If it can't, well, then not so much. The point is not to create Yet Another Plain Text Version, but to validate a methodology. The best reason I can think of for resisting this experiment is so as not to risk contradicting a deeply-held belief. I live in a highly religious state, so I understand your reluctance. But perhaps there is someone else out there who is also a fan of ReST but open-minded enough to perform the experiment?

On 10/16/2012 07:21 PM, Lee Passey wrote:
Because those are all too easy. I discovered an e-text that uses enough different textual structures that it may qualify as a good overall test case for any markup language.
If you believe that, you have a very limited horizon concerning textual structures. The text you selected contains very few of them. It is a far too simple text to serve as a realistic benchmark.
If ReST can accommodate all the requirements of 14668 then I would say it is a reasonably powerful markup language. If it can't, well, then not so much.
It can easily. All 3 of them.
The best reason I can think of for resisting this experiment is so as not to risk contradicting a deeply-held belief.
Then you should try and think harder. -- Marcello Perathoner webmaster@gutenberg.org

Wikipedia view of the significance of McGuffey: It is estimated that at least 120 million copies of McGuffey's Readers were sold between 1836 and 1960, placing its sales in a category with the Bible<http://en.wikipedia.org/wiki/Bible>and Webster's Dictionary <http://en.wikipedia.org/wiki/Webster%27s_Dictionary>. Since 1961 they have continued to sell at a rate of some 30,000 copies a year. No other textbook bearing a single person's name has come close to that mark. On Tue, Oct 16, 2012 at 9:54 AM, Marcello Perathoner <marcello@perathoner.de
wrote:
On 10/16/2012 06:13 PM, Lee Passey wrote:
On 10/11/2012 2:55 PM, Marcello Perathoner wrote:
On 10/11/2012 05:52 AM, Lee Passey wrote:
ReST advocates should take the same challenge.
You propose a task that nobody could possibly be interested in so that you can proclaim yourself victor?
Nope. Not interested.
Yeah, I couldn't figure out a way to do it either.
Yawn! I've seen better trolls on this list.
How about doing the top 10 books aka. books that people really want to read?
-- Marcello Perathoner webmaster@gutenberg.org ______________________________**_________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/**mailman/listinfo/gutvol-d<http://lists.pglaf.org/mailman/listinfo/gutvol-d>

On 10/16/2012 07:38 PM, don kretz wrote:
Wikipedia view of the significance of McGuffey:
It is estimated that at least 120 million copies of McGuffey's Readers were sold between 1836 and 1960, placing its sales in a category with the Bible<http://en.wikipedia.org/wiki/Bible>and Webster's Dictionary<http://en.wikipedia.org/wiki/Webster%27s_Dictionary>. Since 1961 they have continued to sell at a rate of some 30,000 copies a year. No other textbook bearing a single person's name has come close to that mark.
The Kama Sutra has been going at a rate of approx. 20,000 per MONTH from PG alone. The proposed McGuffey had only 330 downloads last month. You can do ~60 times more good by doing the Kama Sutra. -- Marcello Perathoner webmaster@gutenberg.org

That'a a peculiar definition of "doing good", but your vote is worth as much as mine. On Tue, Oct 16, 2012 at 11:05 AM, Marcello Perathoner < marcello@perathoner.de> wrote:
On 10/16/2012 07:38 PM, don kretz wrote:
Wikipedia view of the significance of McGuffey:
It is estimated that at least 120 million copies of McGuffey's Readers were sold between 1836 and 1960, placing its sales in a category with the Bible<http://en.wikipedia.org/**wiki/Bible<http://en.wikipedia.org/wiki/Bible>>and Webster's Dictionary<http://en.**wikipedia.org/wiki/Webster%**27s_Dictionary<http://en.wikipedia.org/wiki/Webster%27s_Dictionary>>. Since 1961 they have continued to sell at a rate of some 30,000 copies a year. No other textbook bearing a single person's name has come close to that mark.
The Kama Sutra has been going at a rate of approx. 20,000 per MONTH from PG alone.
The proposed McGuffey had only 330 downloads last month. You can do ~60 times more good by doing the Kama Sutra.
-- Marcello Perathoner webmaster@gutenberg.org ______________________________**_________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/**mailman/listinfo/gutvol-d<http://lists.pglaf.org/mailman/listinfo/gutvol-d>

But if you want to provide a copy of the Kama Sutra marked up in ReST, I'm sure we'd be happy to give you feedback. There are few examples available. and PG could use some more. On Tue, Oct 16, 2012 at 11:05 AM, Marcello Perathoner < marcello@perathoner.de> wrote:
On 10/16/2012 07:38 PM, don kretz wrote:
Wikipedia view of the significance of McGuffey:
It is estimated that at least 120 million copies of McGuffey's Readers were sold between 1836 and 1960, placing its sales in a category with the Bible<http://en.wikipedia.org/**wiki/Bible<http://en.wikipedia.org/wiki/Bible>>and Webster's Dictionary<http://en.**wikipedia.org/wiki/Webster%**27s_Dictionary<http://en.wikipedia.org/wiki/Webster%27s_Dictionary>>. Since 1961 they have continued to sell at a rate of some 30,000 copies a year. No other textbook bearing a single person's name has come close to that mark.
The Kama Sutra has been going at a rate of approx. 20,000 per MONTH from PG alone.
The proposed McGuffey had only 330 downloads last month. You can do ~60 times more good by doing the Kama Sutra.
-- Marcello Perathoner webmaster@gutenberg.org ______________________________**_________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/**mailman/listinfo/gutvol-d<http://lists.pglaf.org/mailman/listinfo/gutvol-d>

How about doing the top 10 books aka. books that people really want to read?
Well, I'm redoing a "top 10" or something like it off the "golden moldies" list, doing it in ePub -- works great less filling -- except now kindlegen doesn't seem to even grok the simplest of ePub functionality. Keep it going Amazon! *sigh*

Why do you think that telling me I'm not sensible for doing books that interest me will convince me to do books you think I should do?
The problem is not that you want to do the books you want to do. The problem is in insisting that there HAS to be a "one size fits all" solution which is sufficiently complicated to handle even the most complicated book. A different "solution" would be to come up with a relatively simple solution which fits well 90% of the books which PG has historically published -- which are relatively simple -- and then the people who want to do the really complicated stuff can still use the techniques THEY want to use to do the really complicated stuff. We already have TeX and PDF to handle really complicated books, and those are pretty good solutions for really complicated books, for example. Or if you want to create a TEI system to do those books, go for it and good luck. Or use PDF/A "Over" for example. But don't insist that WE all have to buy into YOUR complexification that YOU want to use in YOUR efforts just because YOU want to do YOUR really complex stuff! Again, most PG books are pretty simple. And guess what, PG is *still* doing quite simple books quite badly. HTML on desktops works pretty well. Everywhere else is quite broken.

You point out the biggest reason more people aren't contributing to PG, and more people are helping out at DP. It's too complicated. The more complex the process, the fewer people will consider it reasonable to put their time into it. Keep in mind the way the contribution process has evolved. (This is a guess to some degree...) 1. At the beginning, one person types in one text and submits it. Simple. 2. Then one person OCR's a text for something to start with. Faster, but more complicated. 3. DP comes along and divides up the work, which consists of many people fixing the OCR, one page at a time, two people per page. Faster but more complicated, so raising the threshhold for people who want to start. 4. Whitewashers start sending texts back for rework, rather than posting what was submitted and encouraging iterative improvement. Major slowdown, major increase in complexity. 5. And from there DP has essentially simplified nothing and increased the complexity (and intimidated new helpers) to the point where growth is stalled. The path away from the quagmire needs to reintroduce simplicity, and especially single simple jobs that iteratively improve a piece of text, with immediate gratification. There probably need to be several simultaneous small-to-medium-size units of work that offer closure and positive reinforcement. On Thu, Oct 11, 2012 at 10:35 AM, James Adcock <jimad@msn.com> wrote:
Why do you think that telling me I'm not sensible for doing books that interest me will convince me to do books you think I should do?
The problem is not that you want to do the books you want to do. The problem is in insisting that there HAS to be a "one size fits all" solution which is sufficiently complicated to handle even the most complicated book.
A different "solution" would be to come up with a relatively simple solution which fits well 90% of the books which PG has historically published -- which are relatively simple -- and then the people who want to do the really complicated stuff can still use the techniques THEY want to use to do the really complicated stuff. We already have TeX and PDF to handle really complicated books, and those are pretty good solutions for really complicated books, for example. Or if you want to create a TEI system to do those books, go for it and good luck. Or use PDF/A "Over" for example.
But don't insist that WE all have to buy into YOUR complexification that YOU want to use in YOUR efforts just because YOU want to do YOUR really complex stuff!
Again, most PG books are pretty simple. And guess what, PG is *still* doing quite simple books quite badly. HTML on desktops works pretty well. Everywhere else is quite broken.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

There probably need to be several simultaneous small-to-medium-size units of work that offer closure and positive reinforcement.
The problem I see when I work on whole books id that I usually find problems that require whole-book understanding to resolve.
participants (5)
-
David Starner
-
don kretz
-
James Adcock
-
Lee Passey
-
Marcello Perathoner