Re: 14.8 million ipads sold (in 9 months) during 2010

bob said:
spent a little time looking at them on my Sony Reader:
if nobody else is gonna respond to this, i will... *** first, though, i'll say that bob, if project gutenberg won't host your custom .epubs, i would be very happy to do so. you can just zip them all together and e-mail them to me. i'm sure i will learn something from them, since i am now starting to piece together my .epub conversion routines... in this regard, i would also love to have your feedback, bob (and others, if you have any), on the .epub i have generated for this "jungle" competition with jim. (where are you, jim?)
more on that development as it... develops... *** ok, back to bob's review:
They all have a working html contents page (TOC)
ok, that's good.
also I think the idea of putting a back link to the TOC on every chapter head is excellent and I shall copy it with my hand-crafted epubs.
great. you'll love it. and if you want to take the idea two steps further, do this: at each chapter-heading, also include links that go to the _previous_chapter_ and the _next_chapter_. so instead of a "bicycle-spokes" structure, you have a "chain" structure. this allows the human to "step through" the chapters, either forward or backward, and it is a very useful functionality... and of course, there's also a link to go to the contents page. you can see an example of this structure here:
the left-angle brackets above each header link to the previous chapter, the right-angle ones to the next, and the "c" between them jumps to the table of contents... you will also see there that i used the same structure for the illustrations. above each illustration, there is a left-angle bracket that links to the previous illustration, a right-angle bracket that links to the next illustration, and an "i" that links to a hotlinked list of illustrations... this allows the user to "thumb through" the illustrations, which is a very nice and nifty capability, don't you agree? i don't always use this structure -- for instance, i haven't used it in this jungle-jim competition -- but whenever i leave it out, i end up wishing that i _would_ have used to...
Using the XML TOC (the .ncx file) works but it takes 3-5 seconds to open, which makes this unacceptable as a navigational tool.
believe me, if you use the in-document linking strategy i just suggested, your use of the .ncx navigation will be rare enough that the slow load-time won't be a bother...
This is true for all ten epubs, but it isn't true of other XML TOCs, either for older PG epubs or for my hand-crafted versions.
there must be some difference between those files... if you can track it down, the info might be useful... (i say "might" because who knows if marcello would actually pay any attention, of if he'd just dismiss it.)
The layout seems to me to be, like the curate's egg, 'good in parts'. Actually that's a bit negative, perhaps 'mostly good but with some irritating flaws' might be fairer.
ok, tell us about the flaws...
Chapter headings generally are in too big a font size, (or too bold, or with too large margins, or more than one of the aforegoing).
that should be easy enough for marcello to change... when i know the size of the viewport/page/window, i copyfit the headings. that is, i adjust their size so that the biggest one will nonetheless display nicely. can't do that if you don't know the pagesize, of course, but it's still a good strategy to keep in mind regardless.
I am working on a font-size range of 90%-135% of the browser default in my hand-crafted versions. I find anything smaller is too wearing to read, and anything bigger just gets broken across more than one line (and looks all wrong).
i have found that headers can be bigger _or_ bolder, and only occasionally need to be _both_. then again, i often show 'em in a different color, which helps too. (and, of course, they should have nice air around 'em, something i have _not_ done correctly in these "jungle" conversions i've done. i need to adjust the stylesheet.)
Also I would split chapter headings (CHAPTER I) say from the chapter title (COMING INTO THE WILDERNESS)
i call the combination of those two "the heading". i call the _number_ part "the chapter number" and i call the _title_ part "the chapter title". terminology. and yes, they absolutely _must_ be split. otherwise, you've got a conglomeration that is _way_ too long. combining the number and the title is ludicrous...
and do the title in a smaller font.
not really. if possible, i do the title in a larger font. (subject, of course, to the copyfitting i mentioned.) the way you suggest it is what was done traditionally. but i just think that's wrong. the chapter number doesn't _need_ to be any bigger. as a number, it stands out just fine no matter its size. and i don't believe repetitive and redundant elements -- like the word "chapter" -- really deserve a big size. the title, which is the unique, _could_ be done bigger, but if they are wordy (as they often are, and remember that you need to judge this by using the wordiest one), then you're often stuck with the same size as body-text. which is not necessarily a bad thing, since -- as i said up above -- the white-space around a heading should be more than enough to make it stand out sufficiently.
while I'm about it I don't think Latin chapter numbers are practical on the limited screen size of an e-reader
i assume you really meant to say _roman_ numbers... and yes, i'd agree. i think they were an affectation, and i often change them to arabic numbers. (and if i don't, i usually end up _wishing_ i would have, at some point.) arabic numbers are a lot easier to read, and their display is more consistent. the d.p. people and the whitewashers seem to love to line up those roman numerals in a t.o.c., and yes, it's quite easy to program a routine to do it, but it always strikes me visually as a waste of time and energy.
The front matter isn't always broken into pages correctly (I think, as I don't have the originals).
i believe you.
This may seem like niggling
you don't have to apologize for having an esthetic sense...
This may seem like niggling, but in a book you often have only page boundaries to delimit important stuff like the title, author, publication date, dedication, etc. from less important things like publishing history, printing history, other titles by the same author, or other titles in the same series, adverts for Mars bars (you can find these in wartime Penguin paperbacks explaining why the price has gone up from 2d to 3d!), etc.
i feel your pain.
I think the interesting question here is whether the layout problems are artifacts of the html/epub generator(s), in which case they can presumably be fixed if other people agree with me, or whether they come from eccentricities of the encoding in rst format by the transcriber.
well, i'd think it's as simple as throwing a pagebreak, or putting in a ruled line, but maybe i misunderstand?
I haven't yet found any block quotes in any of the books.
so far, i think roger frank is the only person who has actually submitted a book in r.s.t. format. (and yes, he really is that productive, and was even before r.s.t. he's a fairly good example of what you can accomplish, digitizing books, if you have some good robust tools. it's bad, and sad, d.p. doesn't pay more attention to him.) anyway, roger does mostly very simple books, ones that are just straight paragraphs, for the most part, which is also one of the reasons why he's so prolific at the game. so it's not surprising he hasn't done things like that yet.
These typically feature smaller fonts
alert! i caution someone who talks about "smaller fonts" when it comes to e-books. that practice must be avoided. whatever typographic function smaller fonts play in print must be totally reconsidered when it comes to e-books... the reason is simple: end-users select a font-size which is as _small_ as possible without being _uncomfortable_. so the rest of the argument falls into place automatically. if we make any text _smaller_, it becomes _uncomfortable_. ergo, that is not something that we can do. end of dialog. by the way, the same argument applies to _the_leading_, so that's also not a mechanism we can use in these cases; the good news is that _indentation_ still serves just fine. moreover, this is a variable where we can allow the _user_ to signal a _preference_ for how these structures display.
It would be reassuring to see that rst can deal with those.
r.s.t. is quite capable of doing that. you can be reassured.
Having looked at the html in the epub file I find it somewhat harder to read than that in older epubs or html files. Mainly, I think, because it's the product of a generator not a human.
does it make you feel smart to examine the .html? ;+) because, seriously, i'm not sure why you want to do that. who cares what the underlying code looks like? not me. do you also dig into the postscript code inside of a .pdf? does it bother you that _that_ isn't very human-readable? i know it's hard for you to believe, right now, because you think .html is ubiquitous and that it will always be so, but it will _not_. it will go the way of the dodo bird, and -- hopefully -- that will happen faster than even i think. unfortunately, r.s.t. is not a very convincing substitute. if you look at any r.s.t. file, you will see _lots_ of the stuff that you'll easily recognize as being "markup"... all that stuff will go bye-bye eventually, because our machines will be smart enough to figure out what's a header, and what's a footnote, and what's a numbered list, and what's an item in that list, and what's an epigraph, and so on and so forth. of course, we'll need to give the machine some subtle hints, to steer it in the right direction, especially on things that might be ambiguous, but that kind of gentle nudging is what z.m.l. is. compared to that, .html markup is a bludgeon. take a look at some of the "raw" .zml files here:
now tell me why we need anything more than that. i mean, seriously, i use simple code to _convert_ that plain-text human-readable z.m.l. file into .html that a browser displays. so why couldn't _the_browser_ use that same simple code to do the same thing, so that i feed the browser z.m.l. and it converts my z.m.l. into the .html it "wants", and then displays it. the browser is _already_ megabytes of bloat, and my code is about 100k, so it could _easily_ be incorporated, and that'd make the lives of us humans _so_ much simpler. somebody with the power to make this change _will_ eventually come to grok the compelling strength of this argument, and dictate change... but in the meantime, if you still want to struggle through reading .html markup, be my guest, but i -- for one -- have more important stuff to do.
Has anybody worried about the lifespan or broad acceptance of rst in the wider world?
you don't need to worry about that... seriously, but...
I can only find references to it as a Python documentation tool. How likely is it that rst tools will be available and supported in a few decades, or that there will be stable and detailed definitions of rst by industry standards bodies for implementations to conform to?
...it is somewhat amusing that you would ask this, because this is only an argument project gutenberg (i.e., _marcello_) uses _against_ tech he doesn't like. if _he_ uses something, its wider acceptance in the world at large doesn't make any difference at all... -bowerbird

Was there a suggestion that Project Gutenberg should not host .epubs? Or is bowerbird just phishing for donations of books to him that were not forthcoming FROM him when requested. . . ? On Fri, 28 Jan 2011, Bowerbird@aol.com wrote:
bob said:
spent a little time looking at them on my Sony Reader:
if nobody else is gonna respond to this, i will...
***
first, though, i'll say that bob, if project gutenberg won't host your custom .epubs, i would be very happy to do so. you can just zip them all together and e-mail them to me. i'm sure i will learn something from them, since i am now starting to piece together my .epub conversion routines...
in this regard, i would also love to have your feedback, bob (and others, if you have any), on the .epub i have generated for this "jungle" competition with jim. (where are you, jim?)
more on that development as it... develops...
***
ok, back to bob's review:
They all have a working html contents page (TOC)
ok, that's good.
also I think the idea of putting a back link to the TOC on every chapter head is excellent and I shall copy it with my hand-crafted epubs.
great. you'll love it.
and if you want to take the idea two steps further, do this: at each chapter-heading, also include links that go to the _previous_chapter_ and the _next_chapter_. so instead of a "bicycle-spokes" structure, you have a "chain" structure.
this allows the human to "step through" the chapters, either forward or backward, and it is a very useful functionality...
and of course, there's also a link to go to the contents page.
you can see an example of this structure here:
the left-angle brackets above each header link to the previous chapter, the right-angle ones to the next, and the "c" between them jumps to the table of contents...
you will also see there that i used the same structure for the illustrations. above each illustration, there is a left-angle bracket that links to the previous illustration, a right-angle bracket that links to the next illustration, and an "i" that links to a hotlinked list of illustrations... this allows the user to "thumb through" the illustrations, which is a very nice and nifty capability, don't you agree?
i don't always use this structure -- for instance, i haven't used it in this jungle-jim competition -- but whenever i leave it out, i end up wishing that i _would_ have used to...
Using the XML TOC (the .ncx file) works but it takes 3-5 seconds to open, which makes this unacceptable as a navigational tool.
believe me, if you use the in-document linking strategy i just suggested, your use of the .ncx navigation will be rare enough that the slow load-time won't be a bother...
This is true for all ten epubs, but it isn't true of other XML TOCs, either for older PG epubs or for my hand-crafted versions.
there must be some difference between those files... if you can track it down, the info might be useful... (i say "might" because who knows if marcello would actually pay any attention, of if he'd just dismiss it.)
The layout seems to me to be, like the curate's egg, 'good in parts'. Actually that's a bit negative, perhaps 'mostly good but with some irritating flaws' might be fairer.
ok, tell us about the flaws...
Chapter headings generally are in too big a font size, (or too bold, or with too large margins, or more than one of the aforegoing).
that should be easy enough for marcello to change...
when i know the size of the viewport/page/window, i copyfit the headings. that is, i adjust their size so that the biggest one will nonetheless display nicely.
can't do that if you don't know the pagesize, of course, but it's still a good strategy to keep in mind regardless.
I am working on a font-size range of 90%-135% of the browser default in my hand-crafted versions. I find anything smaller is too wearing to read, and anything bigger just gets broken across more than one line (and looks all wrong).
i have found that headers can be bigger _or_ bolder, and only occasionally need to be _both_. then again, i often show 'em in a different color, which helps too.
(and, of course, they should have nice air around 'em, something i have _not_ done correctly in these "jungle" conversions i've done. i need to adjust the stylesheet.)
Also I would split chapter headings (CHAPTER I) say from the chapter title (COMING INTO THE WILDERNESS)
i call the combination of those two "the heading".
i call the _number_ part "the chapter number" and i call the _title_ part "the chapter title". terminology.
and yes, they absolutely _must_ be split. otherwise, you've got a conglomeration that is _way_ too long. combining the number and the title is ludicrous...
and do the title in a smaller font.
not really. if possible, i do the title in a larger font. (subject, of course, to the copyfitting i mentioned.)
the way you suggest it is what was done traditionally.
but i just think that's wrong.
the chapter number doesn't _need_ to be any bigger. as a number, it stands out just fine no matter its size. and i don't believe repetitive and redundant elements -- like the word "chapter" -- really deserve a big size.
the title, which is the unique, _could_ be done bigger, but if they are wordy (as they often are, and remember that you need to judge this by using the wordiest one), then you're often stuck with the same size as body-text. which is not necessarily a bad thing, since -- as i said up above -- the white-space around a heading should be more than enough to make it stand out sufficiently.
while I'm about it I don't think Latin chapter numbers are practical on the limited screen size of an e-reader
i assume you really meant to say _roman_ numbers...
and yes, i'd agree. i think they were an affectation, and i often change them to arabic numbers. (and if i don't, i usually end up _wishing_ i would have, at some point.)
arabic numbers are a lot easier to read, and their display is more consistent. the d.p. people and the whitewashers seem to love to line up those roman numerals in a t.o.c., and yes, it's quite easy to program a routine to do it, but it always strikes me visually as a waste of time and energy.
The front matter isn't always broken into pages correctly (I think, as I don't have the originals).
i believe you.
This may seem like niggling
you don't have to apologize for having an esthetic sense...
This may seem like niggling, but in a book you often have only page boundaries to delimit important stuff like the title, author, publication date, dedication, etc. from less important things like publishing history, printing history, other titles by the same author, or other titles in the same series, adverts for Mars bars (you can find these in wartime Penguin paperbacks explaining why the price has gone up from 2d to 3d!), etc.
i feel your pain.
I think the interesting question here is whether the layout problems are artifacts of the html/epub generator(s), in which case they can presumably be fixed if other people agree with me, or whether they come from eccentricities of the encoding in rst format by the transcriber.
well, i'd think it's as simple as throwing a pagebreak, or putting in a ruled line, but maybe i misunderstand?
I haven't yet found any block quotes in any of the books.
so far, i think roger frank is the only person who has actually submitted a book in r.s.t. format. (and yes, he really is that productive, and was even before r.s.t. he's a fairly good example of what you can accomplish, digitizing books, if you have some good robust tools. it's bad, and sad, d.p. doesn't pay more attention to him.)
anyway, roger does mostly very simple books, ones that are just straight paragraphs, for the most part, which is also one of the reasons why he's so prolific at the game. so it's not surprising he hasn't done things like that yet.
These typically feature smaller fonts
alert! i caution someone who talks about "smaller fonts" when it comes to e-books. that practice must be avoided.
whatever typographic function smaller fonts play in print must be totally reconsidered when it comes to e-books...
the reason is simple: end-users select a font-size which is as _small_ as possible without being _uncomfortable_. so the rest of the argument falls into place automatically. if we make any text _smaller_, it becomes _uncomfortable_. ergo, that is not something that we can do. end of dialog.
by the way, the same argument applies to _the_leading_, so that's also not a mechanism we can use in these cases; the good news is that _indentation_ still serves just fine. moreover, this is a variable where we can allow the _user_ to signal a _preference_ for how these structures display.
It would be reassuring to see that rst can deal with those.
r.s.t. is quite capable of doing that. you can be reassured.
Having looked at the html in the epub file I find it somewhat harder to read than that in older epubs or html files. Mainly, I think, because it's the product of a generator not a human.
does it make you feel smart to examine the .html? ;+)
because, seriously, i'm not sure why you want to do that. who cares what the underlying code looks like? not me. do you also dig into the postscript code inside of a .pdf? does it bother you that _that_ isn't very human-readable?
i know it's hard for you to believe, right now, because you think .html is ubiquitous and that it will always be so, but it will _not_. it will go the way of the dodo bird, and -- hopefully -- that will happen faster than even i think.
unfortunately, r.s.t. is not a very convincing substitute.
if you look at any r.s.t. file, you will see _lots_ of the stuff that you'll easily recognize as being "markup"...
all that stuff will go bye-bye eventually, because our machines will be smart enough to figure out what's a header, and what's a footnote, and what's a numbered list, and what's an item in that list, and what's an epigraph, and so on and so forth.
of course, we'll need to give the machine some subtle hints, to steer it in the right direction, especially on things that might be ambiguous, but that kind of gentle nudging is what z.m.l. is. compared to that, .html markup is a bludgeon.
take a look at some of the "raw" .zml files here:
now tell me why we need anything more than that.
i mean, seriously, i use simple code to _convert_ that plain-text human-readable z.m.l. file into .html that a browser displays. so why couldn't _the_browser_ use that same simple code to do the same thing, so that i feed the browser z.m.l. and it converts my z.m.l. into the .html it "wants", and then displays it. the browser is _already_ megabytes of bloat, and my code is about 100k, so it could _easily_ be incorporated, and that'd make the lives of us humans _so_ much simpler.
somebody with the power to make this change _will_ eventually come to grok the compelling strength of this argument, and dictate change...
but in the meantime, if you still want to struggle through reading .html markup, be my guest, but i -- for one -- have more important stuff to do.
Has anybody worried about the lifespan or broad acceptance of rst in the wider world?
you don't need to worry about that... seriously, but...
I can only find references to it as a Python documentation tool. How likely is it that rst tools will be available and supported in a few decades, or that there will be stable and detailed definitions of rst by industry standards bodies for implementations to conform to?
...it is somewhat amusing that you would ask this, because this is only an argument project gutenberg (i.e., _marcello_) uses _against_ tech he doesn't like.
if _he_ uses something, its wider acceptance in the world at large doesn't make any difference at all...
-bowerbird

Was there a suggestion that Project Gutenberg should not host .epubs?
What I have certainly been told is that "PG" -- whoever that actually is -- is not willing to accept epubs nor mobis, but insists on receiving input in the form of HTML, and generating the epub and the mobi "automatically" from that HTML. In practice if one tries to submit epub or mobi one is turned away by the whitewashers, in my experience. I'm not saying that is or isn't the right policy. One might argue that epub is actually a better input format, and that html and mobi ought to be generated from epub. In practice one can get bad epub or mobi submitted, just like one can get bad html submitted. Certainly mobileread.com has lots of examples (one way or the other) where individuals have taken PG books and "improved" upon them. Again, if PG generated good guidelines for html, and/or epub and/or mobi it might be helpful -- but getting agreement on "good" guidelines is probably also difficult in practice.
participants (3)
-
Bowerbird@aol.com
-
James Adcock
-
Michael S. Hart