Re: [gutvol-d] Re: aspects of a well-done e-book

----- Original Message ----- From: "David A. Desrosiers" <hacker@gnu-designs.com>
Well, in a perfect world, we could guarantee that the separate CSS file is accessible and life is good. Unfortunately, since we can't guarantee the CSS file is there, we decided to embed the CSS inside the HTML.
If you can guarantee the HTML is there, you can guarantee that the CSS is there. If the CSS is missing, it shouldn't "break" the usability of the HTML document.
I can guarantee that CSS file is in the PG directory. I can't guarantee that Joe Sixpack will download that when he grabs the HTML file. Overall, this makes things simpler for the consumer of the e-text.
It bloats it somewhat, but it is still smaller than the obligatory PG header information, so I don't feel TOO badly about it. And now we get a fully self-contained file.
I don't understand the correlation. What does your CSS size have to do with the obligatory PG header size?
The CSS adds to the size of the etext, and on some level that feels ... wrong. I can't explain why, it just does. However, whenever PG posts a new e-text, they add a great big header and footer to the document for legal reasons. That thing absolutely dwarfs the CSS style header is size, so I don't feel AS badly as I might otherwise. It was mostly a throw-away comment, so don't read too much into it.
Now that I think about it, you may be right... In Firefox (which is what I have on this machine), there is no View -> Use Style menu option, but there is the icon in the bottom left corner. *shrug*
For those that want to see this in a much-more expanded version, go to http://w3.org/Style/ in a Gecko-based browser, and click on the icon, or go to View -> Use Style, and try the various stylesheets listed there.
I have a big aversion to taking an electronic document and presenting it as "pages." First and foremost, it is ugly.
I submit that having page numbers in an unintuitive place (left-side margins, which doesn't appear in any printed work I can find), is just as ugly.
The original page breaks were necessitated by the size of paper the publisher used. There is almost never a functional meaning to the page breaks in a book (except things like chapter breaks, which are easily marked up with horizontal rules or something to that effect). The page numbers in the margins are small and fairly unobstrusive, yet give the information in the easiest manner I could devise. Furthermore, they are completely hidden unless the read WANTS to have that information.
Second, it is going to wreck havoc whenever the user wants to change font sizes, page sizes, etc.
Having the border at the bottom of page 423 with a font size of 1.0em is still going to put the border at the bottom of the page when the font is 2.8em.
I think I'm missing your allegory here. Can you explain?
If you put visible page breaks into an HTML document, the user is going to expect that document to print to his printer at exactly those page breaks. Good luck. Also, page breaks would only make sense if you broke them into "visual" chunks. By that, I mean sizes that fit into one screen at a time -- no scrolling. However, if the user has a different resolution than you, it ain't gonna work. If he changes the font size, it ain't gonna work. Basically, using visual page dividers is getting into typography, something you want to avoid. Good HTML lets the browser and the user format the text. You just tell them what KIND of text it is. The page numbers are not meant to give you visual indication of page breaks as much as contextual information regarding the original source... which some people find very important and as it's fairly easy for me to include that information without disturbing the other readers, I do. Josh PS None of this is an argument for my CSS based HTML over TEI-Lite. I would LOVE if we have TEI-Lite capabilities right now... But we don't.

I can guarantee that CSS file is in the PG directory. I can't guarantee that Joe Sixpack will download that when he grabs the HTML file.
Agreed. If he wants a richer reading experience, he should grab the CSS. Pretty simple overall. If the reader wants to grab 200 etexts, its easier to let them know they need one .css file, than 200 identical css stanzas. I understand your needs, but you're un-CSS-ifying CSS.
The CSS adds to the size of the etext, and on some level that feels ... wrong. I can't explain why, it just does. However, whenever PG posts a new e-text, they add a great big header and footer to the document for legal reasons. That thing absolutely dwarfs the CSS style header is size, so I don't feel AS badly as I might otherwise.
The PG header is considered "content", while CSS is considered "presentation". Again, I understand where you're coming from here, I just don't personally agree with it. I'm more of a purist, in the strictest sense of the word. ;)
If you put visible page breaks into an HTML document, the user is going to expect that document to print to his printer at exactly those page breaks. Good luck.
This is why 'media="print"' exists in a CSS declaration. See here for more: http://www.w3.org/TR/REC-CSS2/media.html
Also, page breaks would only make sense if you broke them into "visual" chunks. By that, I mean sizes that fit into one screen at a time -- no scrolling. However, if the user has a different resolution than you, it ain't gonna work. If he changes the font size, it ain't gonna work.
You can't translate a book into something read in a web browser, and retain the same functionality. The whole point of a scrollbar is to remove that constraint. Though I agree, unnessarily-long webpages (scrolling down for hundreds of pages) are a pain, but the alternative is much more painful.
The page numbers are not meant to give you visual indication of page breaks as much as contextual information regarding the original source... which some people find very important and as it's fairly easy for me to include that information without disturbing the other readers, I do.
Right. Your page numbers don't correlate to anything, except an "Oh thats neat!" kind of feeling as you imagine what it would be like to be reading page 423 in the printed (dead-tree) version of that particular work. Page 423 in your numbering scheme is not the 423'd page as seen in my browser.
PS None of this is an argument for my CSS based HTML over TEI-Lite. I would LOVE if we have TEI-Lite capabilities right now... But we don't.
I'm still gathering info and doing research on all of the alternatives presented thus far. TEI is one of the datapoints in my research. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

David A. Desrosiers wrote:
I can guarantee that CSS file is in the PG directory. I can't guarantee that Joe Sixpack will download that when he grabs the HTML file.
This is one of those problems with no easy answer. If you want the user to be able to download your book to read offline, then you've got to also make sure the user downloads the style sheet that goes with it. [If you only expect them to read online, it doesn't matter.] My own solution is simply to make a zip file for downloading, which includes both the html page(s) and style sheet. I use the same style sheet for all books, but actually copy it to each ebook's directory, so there are currently around 850 copies of the same style sheet. But it is trivial to update them all from the master. It still uses more space, but the alternative, having all the html files link to a single css doesn't allow for zipping and downloading. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

I use the same style sheet for all books, but actually copy it to each ebook's directory, so there are currently around 850 copies of the same style sheet. But it is trivial to update them all from the master.
That seems like a horrible waste of inodes. I feel this pain, because I ran out of inodes on one of my arrays working on some PG works, even though I had 50GiB of space free on the drive. I had to reformat with more inodes to work around the problem.
It still uses more space, but the alternative, having all the html files link to a single css doesn't allow for zipping and downloading.
Here's an easy solution: In each .zip, you include a copy of the stylesheet, the same stylesheet you include with every copy... except, when you unzip the works, they go into a structure like this: Gutenberg/ |-- books | |-- Book_One.xml | `-- Book_Two.xml `-- styles `-- Gutenberg.css Every .zip that you unzip into there, will overwrite Gutenberg.css with the copy that you duplicate inside each .zip file, and the .xml (or .html or text or whatever) versions of the books go into a separate subdir. In your .xml files, you use the standard <base url="..."> clause or simply point your style declaration to ../styles/Gutenberg.css. This is exactly how it works on the Web in general, for very similar projects. Did that make sense? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com

David A. Desrosiers wrote:
You can't translate a book into something read in a web browser, and retain the same functionality. The whole point of a scrollbar is to remove that constraint.
Yeay! Something I've been saying for years. The "e" in ebook gives us opportunities that don't exist in print, so let's use them.
Though I agree, unnessarily-long webpages (scrolling down for hundreds of pages) are a pain, but the alternative is much more painful.
Reading a book with hundreds of pages is painful. I don't see why scrolling is any more painful than turning pages. (The Mobipocket reader for Palm also has an auto scroll option which just scrolls the text slowly by, which could be a nice feature in browsers.) One advantage of print is the ease of bookmarking a spot -- something that can't be done easily on most ebooks, although I'm working on a simple HTML solution. I also now provide a single HTML file version and a multi-page version of my ebooks. Usually the multi-page version splits the work into chapters (or whatever is the major division for the work). The multi-page version was mainly intended to make online reading easier -- there's less to download for each chapter. It also means that Google is more likely to index the content -- they have, I think, a 100k limit per file. But most browsers can easily accomodate the complete, single-file version of the average work, up to a MB or so. Something like Don Quixote is a bit more of a problem as a single file, being large in text size and also carrying many illustrations, making the total download many megabytes. Something that large really needs to be split. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/

Reading a book with hundreds of pages is painful. I don't see why scrolling is any more painful than turning pages. (The Mobipocket reader for Palm also has an auto scroll option which just scrolls the text slowly by, which could be a nice feature in browsers.)
We've had that in Plucker for quite some time also (and Plucker's format is openly documented, unlike MobiPocket's format). Related to that, you CAN have autoscroll in your browser (again, making the assumption that you're using a standards-compliant browser). http://autoscroll.mozdev.org/
One advantage of print is the ease of bookmarking a spot -- something that can't be done easily on most ebooks, although I'm working on a simple HTML solution.
We've got bookmarking, and we're adding cross-document bookmarks and interlinking in our next version. We've been thinking about these (and other similar problems and solutions) for quite awhile now.
I also now provide a single HTML file version and a multi-page version of my ebooks. Usually the multi-page version splits the work into chapters (or whatever is the major division for the work).
I do the same for my HOWTO documents, sourced from SGML. One call each with with jade or sgmltools will generate the multi-document version of HTML or the single-document version. I run that through hindent and tidy for a few passes, and out comes properly-validated XHTML (mostly). You can see what one of those kinds of preparations looks like over here. This particular work is only HTML4.0 Transitional, and not fully validated yet, but you can see what I did with the stylesheet and general output of the SGML: http://faqs.gnu-designs.com/pokerfaq/ The mobile version is over here (with screenshots): http://plkr.org/news/46
The multi-page version was mainly intended to make online reading easier -- there's less to download for each chapter. It also means that Google is more likely to index the content -- they have, I think, a 100k limit per file.
Funny you mention that. I've been doing some SEO work on my HTML version of the 9/11 Commission Report, and the original chapters I converted were 100+k and more, many of them into the 200k and 300k range. I took some time to split those up into their own subchapters. You can see THAT work over here: http://911.gnu-designs.com/ I put a ton of hand-editing and automated work into this particular effort. With over 7,000 downloads of the mobile formats I've created from that work, it seems to be quite popular. It is this same level of quality that I am striving for with PG works I convert.
Something that large really needs to be split.
We agree. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com
participants (3)
-
David A. Desrosiers
-
Joshua Hutchinson
-
Steve Thomas