here's that alice pdf

here's that alice pdf i mentioned on friday. it's at:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf
it doesn't represent a finished version, not quite yet, but i posted it because it demonstrates some issues. first of all, you should notice that it has no pimples, as i have exercised control over widows and orphans. in general, the pagebreaks are fairly good throughout. there are a couple i will improve -- can you find them? the biggest issue is that the typography is very dense. even with internet-style blocked paragraphs which have a full blank line between paragraphs, it's still too dense. one reason for the density is because i set the leading at 35 lines per page, and that's more than most books. (most p-books run anywhere from 27-31 lines per page.) the biggest reason for the dense typography -- and the rationale behind the small leading -- is because the lines are of very long length. the file has a line-length that is rather typical of the e-texts found in project gutenberg. (the longest line in the original alice30.txt is 77 characters; 33 lines are 66 characters long or longer. that's too long.) this long line-length is also why the margins are so small. with the font i used -- georgia, a font that was designed to look good on-screen _and_ in-print -- it was necessary to make the margins that small to get all the lines to fit. this was the case even with the type at a measly 9-point. (and 9-point is too small for printout, let alone the screen.) the point: project gutenberg line-lengths are far too long. furthermore, they're being calculated in the wrong manner. when you break lines according to simple character-count, the underlying assumption is the use of a monospaced font. and that's a bad assumption regarding people reading books, because nobody should read a book using a monospaced font; that's just inviting eye-strain and fatigue to ruin the experience. with as loudly as people complain about the unpleasantness of on-screen reading, we must do everything we can to nudge 'em into an experience that lessens the irritation as much as possible. instead of breaking lines according to character-count, we should break them according to _string-width_ in a _proportional_font_. although different fonts can vary greatly in terms of their width, the bulk of reading comprehension studies indicate that a length around 50 characters is optimal, so that's what we should aim at. (to equate the figures, 50 12-point characters is about 300 pixels.) as most of you know now, my viewer-program uses a 2-up display, the familiar facing-pages spread that most end-users associate with "a real book". what they usually say is, "it looks like a real book." (which brings up some questions about the nature of "reality". but let us just say that we know what they mean, and leave it at that...) i don't know if it's just a coincidence, or if there is a deep underlying reason that explains why, but it just so happens that a line-length of some 50 characters, with nice margins on both sides, makes a "page" that fits nicely in the 2-page-spread in the interface of my viewer-app. you can demonstrate this to yourself. open up your wordprocessor and load some text, choosing a fontsize that's comfortable for you; make the window-width convenient for reading for an extended time. unless i miss my bet, or you're lucky enough to have a cinema-screen, your window-width will be about one-half the width of your monitor... another indication of this general rule is the template for most blogs. look at a lot of them and you'll see that the left-quarter of the screen is typically used as a sidebar, and so is the right-quarter, which leaves the middle-half as the primary text-area containing the blog's content. (most templates don't resize this "main text" arena when the person makes the fontsize bigger than the standard-size, which is what they should do, for reading comprehension. that's true of non-blog sites too.) so -- in creating my z.m.l. versions of the project gutenberg e-texts -- i use a shorter line-length, and i have found that that works very well... i doubt the "powers that be" at project gutenberg will listen to me, but my suggestion is that you should use the shorter line-length too... of course, my viewer-app lets people remarginate the text at any time, so hard-wrapped lines at a longer length are not any type of _obstacle_, and it doesn't matter to me if project gutenberg's policy changes or not. i'm just suggesting it because it's an idea that has worked well for me... back to the alice .pdf, though, what you'll see is when lines are wrapped by their character-count, there are bad variations in their "true" length when they are converted into a proportional font. and that's unfortunate. in additional to the remargination that my viewer-program lets the user do, it will also adjust the "default" linebreaks that are computed in the new size, which will quite often create loose lines. in general, although i'm using a rather simple routine to do these adjustments -- because it has to be fast enough to work on-the-fly for real-time display -- i'm getting paragraphs that have very few loose lines. i'm quite happy with it. one reason i'm showing you this output with their original hard-wrapped lines is so that when i later show you output with the "default" soft-wrapped lines, you can compare them. and then when i show you output with my "adjusted" linebreaks, you'll be able to compare those results to the other two instances. (it's not that i expect anyone here to actually care that much, or even at all; but if there _does_ happen to be someone who does, i'll satisfy their curiosity. and i have typography friends elsewhere who find all of this _very_ interesting.) now for the other issues. for the most part, i think the design is straightforward. i'm sure some people will have opinions on things like the chapter-headings, and i would be happy to hear 'em, whether positive or negative. i'm especially open to constructive criticism, and would love to have graphic-design people chip in... the issue on which i would most like feedback is how to handle the illustrations. as you'll see (if you choose to look at the .pdf), what i did is that when a picture is called by the text, i have thrown a page-advance, and displayed that picture in whatever space happens to be remaining on that page, re-sizing the picture (down, since all of these images were the size of a full page). in many cases, that was perfectly fine, in that the picture was displayed at a large-enough size that its detail can be grasped sufficiently, even without using acrobat's zoom... in a few cases, though, the picture really isn't large enough to be appreciated. in one case -- ironically appropriate, the one about alice being "cramped" -- the remaining space was so small, the image is little more than a thumbnail. this "use-the-remaining-space" modus operandi is the one my viewer-app uses, and -- in the case of on-screen viewing -- it is entirely appropriate and reasonable. there, it doesn't matter if the picture is thumbnail-sized, because the reader can simply click on it and a larger version of it will be displayed for their examination. so, the question is, would this be an acceptable solution for a .pdf version too? i can turn each picture into a clickable link that will jump to a full-size graphic. is that what i should do? the down-side to that approach is that (i believe that) including each illustration twice -- once on the page where it is called, and then again on a page (at the end of the book) in a full-size version and all by itself -- will bloat the .pdf. in the case of "alice", for instance, there are 42 illustrations. including each of these on a page of their own at the end of the book would turn this 125-page .pdf into a 167-page .pdf. people wouldn't _have_ to print all 167 pages, if they didn't want to, they could just print up through page 125, so maybe that's not a big deal. but i'm wondering what y'all would recommend. another option would be to print each illustration at full-width where the text calls it, floating it to the next page if it won't fit in the space remaining on the current page. that might turn this into a 145-page .pdf (i'm just guessing, i haven't tried it out), which would have an impact when it came to a person printing out the entire book, but it wouldn't bloat the .pdf so badly because each graphic is only included once. i had a bunch of other ideas about how to deal with this little problem, but now i've forgotten what they were. which is ok, i guess, because this is long enough already. -bowerbird

Bowerbird@aol.com wrote:
here's that alice pdf i mentioned on friday. it's at:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf
I'm glad you posted yours so people can compare. This is what we get out of TEI without hand-tweaking anything: http://pgtei.pglaf.org/marcello/0.4/examples/alice/11-pdf.pdf I especially want to draw people's attention to following problems in bb's pdf: (in parens the pages where the problem can best be observed) line justification bb's pdf isn't justified at all. It is simply lines broken using character count. That will do for fixed fonts but never for proportional ones. hyphenation long words need to be hyphenated to make interword spacing more even. em-dashes should use typografic dashes instead of -- unprintable margins many laser printers will not print to the very edge of the page, thus bb's lines will be cut off. incompatible page blackness (16-17, 22-23, 26-27) left and right-hand pages have different amounts of leading. This looks ugly. footnotes should (as the name implies) show at the foot of the same page, not in the appendix picture size (39, 40) pictures should float to the next page if not enough room is left onthis page, not be resized to poststamp size missing pictures (23) all pictures should make it into the pdf table of contents should have page numbers -- Marcello Perathoner webmaster@gutenberg.org

On Mon, Sep 26, 2005 at 06:29:50PM -0400, Bowerbird@aol.com wrote:
furthermore, they're being calculated in the wrong manner. when you break lines according to simple character-count, the underlying assumption is the use of a monospaced font.
and that's a bad assumption regarding people reading books, because nobody should read a book using a monospaced font; that's just inviting eye-strain and fatigue to ruin the experience.
with as loudly as people complain about the unpleasantness of on-screen reading, we must do everything we can to nudge 'em into an experience that lessens the irritation as much as possible.
I disagree with the above. Personally, I find monospaced, serifed fonts to be the easiest to read, and am frequently frustrated by the lack of books that use a monospaced font, and I wonder who is to blame for it. In all of the software usability testing that I've done over the years (mostly involving web applications), monospaced fonts consistently score the highest for readability and comfort with the users. Do you have some pointers to research that might explain why the prevailing wisdom is that monospaced fonts are "bad"? I've tried various google searches, and only managed to discover results that agree with my position.

Joey wrote:
Bowerbird@aol.com wrote:
furthermore, they're being calculated in the wrong manner. when you break lines according to simple character-count, the underlying assumption is the use of a monospaced font.
and that's a bad assumption regarding people reading books, because nobody should read a book using a monospaced font; that's just inviting eye-strain and fatigue to ruin the experience.
with as loudly as people complain about the unpleasantness of on-screen reading, we must do everything we can to nudge 'em into an experience that lessens the irritation as much as possible.
I disagree with the above.
Personally, I find monospaced, serifed fonts to be the easiest to read, and am frequently frustrated by the lack of books that use a monospaced font, and I wonder who is to blame for it.
In all of the software usability testing that I've done over the years (mostly involving web applications), monospaced fonts consistently score the highest for readability and comfort with the users.
Do you have some pointers to research that might explain why the prevailing wisdom is that monospaced fonts are "bad"? I've tried various google searches, and only managed to discover results that agree with my position.
Over the years there's been fairly extensive experiments performed on a large sample of people who read printed text and where many typographic settings are varied. These people are tested for reading speed and reading comprehension, and asked about personal preferences. The most famous of these tests were those done by Tinker and associates (Bill Hill in his tome, "The Magic of Reading" refers a lot to these studies.) For example, some of the tested typographic settings included general font types (serif vs. sans serif), font spacing (proportional vs. monospace), font size, leading, line length, margins, text justification, etc., etc. Here (from my flawed memory) are some general typographic settings *in print* that maximize reading comprehension *and* are generally preferred (most comfortable). Of course, there are variations from individual to individual, but there are clearcut preferences of a large sample of people: Serif fonts (big factor) Font size 9-11 points Font spacing proportional (somewhat big factor) Line length about 30em (25-35em about the same) Text Justification: Fully justified (right and left) (big factor) Margins (so area of printed page is about 50-60% of the total area) Leading (can't remember the details, but what is found in a typical novel is optimum; double spaced text is preferred by few.) Some comments on the above. Serif definitely leads to better comprehension and higher speed reading because of how the human eye/brain comprehends glyphs -- the little serifs helps the eye/brain to more quickly identify the character and differentiate characters from one another leading to quicker reading of whole words "as a whole." (Without serifs, the brain has to oftentimes parse the characters individually in a word, to look at each character one by one, to determine the word, which slows down reading speed and comprehension.) If line lengths are too short, people have to switch their eyes too much, leading to fatigue and inability to become immersed in the text lowering reading speed and comprehension. If lengths are too long, people have more difficulty "tracking back" from the end of the line to the beginning of the next line (that is, to know which is the next line! Leading plays a role here.) Text which is NOT fully justified leads to substantial visual distraction and noticeably lowers reading comprehension (provided the right justification is done with high typographic quality and, where necessary, the use of hyphenation.) Of course, this is for print where the resolution of ink on paper approaches the "equivalent" of 600 dpi (computer screens are about 90-120 dpi). Low resolution screens change the rules somewhat. For example, sans serif fonts generally lead to better comprehension on low-rez screen since the serifs on serif fonts sometimes are lost or get too thin on lower resolutions screens. Nevertheless, I think a lot of the "rules" of typography in print still apply to ebook reading. For this reason, in the basic CSS style sheet used for the online version of "My Antonia", I set the line length to be 30 em and not fixed to a particular pixel length as most people do. Now certainly the CSS I use for "My Antonia" leaves a lot to be desired (I'm not a graphics designer -- the CSS styling I've seen for some of the PG/DP books is very good), but I do believe on many monitors the line length I've set is optimal for reading the text, except when the type size gets too large on very large monitors. See: http://www.openreader.org/myantonia/basic-design-nopagenum/myantonia.html (I've not set right justification, but could do so quite easily. The lack of a decent hyphenation engine in most browsers makes right justification more difficult to achieve reasonably well. I think, though, that for line lengths specified to be 30-35 em, that right justification could be used in browser presentation and look good most of the time.) [Btw, in "My Antonia" I also use curly quotes, which studies suggest improves reading comprehension and which, in print, people overwhelmingly prefer. (I personally find it odd the large number of people who don't like curly quotes for online texts, but maybe that is a result of low resolution screens and bad fonts. I think for "My Antonia" the curly quotes improve comprehension and leads to a more pleasing presentation, but then I'm just one person.)] I could go on and talk about other related matters. But the point is that what is important is not what one person likes the best, but what makes sense based upon experiment over a large number of people. Fortunately with reflowable formats (typeset on the user's end, such as envisioned for OpenReader), the end-user can and should be given substantial ability to tailor the typographic presentation to what *they* prefer. That's one reason why I don't care much for PDF for on screen reading (at least unstructured/untagged) since it *forces* typography on end-users, and usually is not optimal for a large percentage of readers due to screen limitations and greatly varying screen size. Jon Noring (p.s., Bill Hill talks about another factor in reading comprehension and ability to achieve "immersive" reading, and that is due to human physiology. In human vision, the center vision is used for visual acuity, to resolve fine details like print. However, the peripheral vision (which is of low visual acuity) is optimized to sense motion (e.g., to look for threats such as from animals trying to eat us.) Thus, immersive reading is more difficult to achieve when the human eye continues to see complex "detail" in the peripheral vision, such as found in many ebook reading systems with lots of menus, buttons, tables of contents along the sides, etc. The subconscious perceives this "busyness" as possibly hiding a threat. This is one reason I tend to dislike Adobe Acrobat Reader: it is "too" busy in the peripheral vision, at least for the default setting of Reader. The ergonomic design of ebook reading systems should take into account the need to minimize unwanted distractions in the peripheral vision. For example, look at the "My Antonia" document (see URL above) where the outside of the page area is kept a solid dark blue color -- this was intentional. Many web designers would format this outside area to look "eye candy", to add repeating graphics images or a floating menu, for example. To essentially fill up the space with "stuff". But doing so is distracting, and makes it more difficult for many readers to achieve the highly immersive reading (called "ludic" reading.) Reading a book *is* a different experience than reading a typical corporate web site, and the styling needs to be different.)

Jon Noring wrote:
Joey wrote:
Bowerbird@aol.com wrote:
furthermore, they're being calculated in the wrong manner. when you break lines according to simple character-count, the underlying assumption is the use of a monospaced font.
and that's a bad assumption regarding people reading books, because nobody should read a book using a monospaced font; that's just inviting eye-strain and fatigue to ruin the experience.
with as loudly as people complain about the unpleasantness of on-screen reading, we must do everything we can to nudge 'em into an experience that lessens the irritation as much as possible.
I disagree with the above.
Personally, I find monospaced, serifed fonts to be the easiest to read, and am frequently frustrated by the lack of books that use a monospaced font, and I wonder who is to blame for it.
In all of the software usability testing that I've done over the years (mostly involving web applications), monospaced fonts consistently score the highest for readability and comfort with the users.
Do you have some pointers to research that might explain why the prevailing wisdom is that monospaced fonts are "bad"? I've tried various google searches, and only managed to discover results that agree with my position.
Over the years there's been fairly extensive experiments performed on a large sample of people who read printed text and where many typographic settings are varied. These people are tested for reading speed and reading comprehension, and asked about personal preferences. The most famous of these tests were those done by Tinker and associates (Bill Hill in his tome, "The Magic of Reading" refers a lot to these studies.)
For example, some of the tested typographic settings included general font types (serif vs. sans serif), font spacing (proportional vs. monospace), font size, leading, line length, margins, text justification, etc., etc.
For studies agreeing with, and disagreeing with many of these assertions see: http://psychology.wichita.edu/optimalweb/text.htm, and linked references. Personally, I prefer proportional, sans-serifed fonts (Tahoma is my favorite). And I get really annoyed with people (like Jon) who want to set the line length for me, filling the remainder of my desktop with blank space. If I want shorter lines I will resize my User Agent window, thank you very much. Oh wait, I can't do that with non-reflowable PDF, can I. And I can't change the font either, can I. So I'm stuck with whatever, BowerBird or Jon Noring, or Marcello Perathoner, or Jakob Nielsen thinks is best for me. To be honest, I haven't looked at any of these competing PDF expressions of Alice, because I can't see how PDF has any value at all to an end user. Arguing about how PDF should be presented is much more a discussion of angels and pinheads than _any_ XML discussion that has ever occured on this list.

Lee wrote:
For studies agreeing with, and disagreeing with many of these assertions see: http://psychology.wichita.edu/optimalweb/text.htm, and linked references. Personally, I prefer proportional, sans-serifed fonts (Tahoma is my favorite). And I get really annoyed with people (like Jon) who want to set the line length for me, filling the remainder of my desktop with blank space. If I want shorter lines I will resize my User Agent window, thank you very much.
Interesting study! As I noted in my prior message, the studies done on typography have been mostly done for print. Some of the research transfers over to the lower rez screen milieu, some doesn't. It's good to see that there have been studies on the readability of online text. Regarding my style sheet... The beauty of the XML approach (provided the markup is done right) is the ability to apply different style sheets to a document. I've wanted to provide more style sheets to the "My Antonia" document in addition to the fixed line-length (in em) currently used as the basic one. Since my graphics arts abilities are pretty poor, I'm hoping others will provide some nice looking ones. I recall seeing some really nice ones for some PG/DP books. CSS Zen Garden demonstrates the extraordinary power to apply quite different looking style sheets to (effectively) the same document. See http://www.csszengarden.com/ With the right reading system, we can allow end-users to substantially tailor the presentation of XML documents where the publisher has provided one or more default ones -- effectively to override the CSS provided by the publisher. Of course, allowing end-users substantial ability to tailor presentation is something effectively impossible to do with PDF today (and the only possibility is to have tagged PDF along with a PDF viewer that is effectively an XML presentation system to handle the PDF tagging vocabulary.)
To be honest, I haven't looked at any of these competing PDF expressions of Alice, because I can't see how PDF has any value at all to an end user. Arguing about how PDF should be presented is much more a discussion of angels and pinheads than _any_ XML discussion that has ever occured on this list.
Definitely! Jon

On 9/27/05, Lee Passey <lee@novomail.net> wrote:
To be honest, I haven't looked at any of these competing PDF expressions of Alice, because I can't see how PDF has any value at all to an end user.
I spend enough time sitting in front of a computer; I'd love a nice looking printout that I can take and read where ever I want. What end users want varies quite a bit.

David Starner wrote:
On 9/27/05, Lee Passey <lee@novomail.net> wrote:
To be honest, I haven't looked at any of these competing PDF expressions of Alice, because I can't see how PDF has any value at all to an end user.
I spend enough time sitting in front of a computer; I'd love a nice looking printout that I can take and read where ever I want. What end users want varies quite a bit.
I spend virtually no time reading while sitting in front of a computer (if you don't count figuring out someone else's source code), except when I am forced by circumstances beyond my control to refer to a PDF file. For _all_ of my recreational reading I download the file to my hand-held device. For many people (and you appear to be one of them) the hand-held device consists of an ordered collection of sheets of paper. Your comments are so obvious, and so correct, that you shouldn't even need to make them, yet it is surprising how often they are forgotten: WHAT END USERS WANT VARIES QUITE A BIT! And this is the problem with PDF. Having a PDF file does not necessarily mean you will get a nice looking printout, it only means you will get a printout that looks like what the document author wanted you to have. If you find widows and orphans disconcerting (and by this I mean typographical widows and orphans, not those caused by mis-guided foreign policy) you will probably not be happy with the Perathoner XSLT to PDF version of Alice in Wonderland, and if you don't like monospaced, serifed fonts you will probably not be happy with the BowerBird ZML to PDF version. Despite Mr. Perathoner 's assertions that any styles which don't match his criteria can be dismissed as bad taste, it remains virtually axiomatic that _de gustibus non disputatum est_. The problem of finding the styling that satisfies the plurality of people ("the greatest good for the greatest number") only exists when, as with PDF, the ultimate rendition is in a fixed, immutable form. And the best way to avoid the problem is to postpone the rendering to the last possible moment, preferably when the "greatest number" has been reduced to one. XSLT is a good method of postponing rendering. Indeed, if everyone had access to a tool whereby you could easily mix one part document (in a master file format) with one part XSL script (of the user's choice), shake well, and end up with a result reflecting the end user's preferences and in the format best suited to the end user's tool set (including an ordered collection of sheets of paper) XSLT may even be the best method. Tools to perform XSL transformations, however, are still far from widespread, and the last time I used them (about 2 years ago) I couldn't even find one that was a complete enough implementation of the spec to do some of the things I wanted to do. XML+CSS is also a good method of postponing rendering. CSS is certainly not as powerful as XSL (being, as it is, merely a style sheet as opposed to a scripting language), but support is much more wide-spread than support for XSLT, permitting end users to do the very kind of mix- and match-ing that I envisioned. Using tools like YesLogic's Prince you can even go from XML+CSS directly to high-quality print or PDF. When it comes to a master file format, what should be selected is the one which (1) permits end-users to postpone rendering decisions (what many people refer to as 'tyopgraphy') to the last possible moment, which (2) allows the end user to have the maximum amount of input as to the which rendering decisions are made, and which (3) permits the end user to use the widest range of tools possible. ZML certainly does not satisfy these criteria, and PDF is even worse. At the moment I believe that TEI+CSS best satisfies these criteria, but this could change as new technologies come on line. But arguing about whether line lengths should be 66 characters, or 30 em, or 11 words or 10 cm (as suggested in "Huey, E. B. (1968). _The psychology and pedagogy of reading_. Cambridge, MA: MIT Press.") is completely irrelvant. Line lengths should be whatever the reader wants them to be, and we should try our best to give the reader that choice.

Lee Passey wrote:
Despite Mr. Perathoner 's assertions that any styles which don't match his criteria can be dismissed as bad taste, it remains virtually axiomatic that _de gustibus non disputatum est_.
I didn't say that. I did say that people who say things like: "de gustibus non disputatum (sic!) est" (there’s no disputing about taste), usually have bad taste because they were never able to defend their taste in a discussion. People with good taste usually like disputing about it, because they know they have good taste and comparing notes helps build even better taste. Moreover, there *is* disputing about Latin: and your Latin is wrong. -- Marcello Perathoner webmaster@gutenberg.org

Hi Everybody, Let me get in on this. I agree with Lee. The creation of the end product should be the last moment. There are several ways to do this as Lee has mentioned. One of my pet methods for more than a decade is to use LaTeX. It is a markup language/system which is appropriate to the task of typesetting. Which we are all talking about. It has styles. It is easy enough to create styles to fit any output format. It creates pdf to match. It is freely availiable for all platforms Windows, Mac, Linux, etc. Need a diffent format, just change one line, render it and voila!! Just my two cents worth. Keith.

On 9/28/05, Lee Passey <lee@novomail.net> wrote:
And this is the problem with PDF. Having a PDF file does not necessarily mean you will get a nice looking printout, it only means you will get a printout that looks like what the document author wanted you to have.
It gives me a better printout than anything else I've seen can. No one here was talking about PDFs as a master format, so that's moot. And I don't think that "allow[ing] the end user to have the maximum amount of input as to the which rendering decisions are made" is a good idea. Options take time to code, most people don't care about widows and orphans, and a page full of mysterious options (and I suspect widows and orphans fall into that classification for many users) is a way to drive away many users.

David Starner wrote:
On 9/28/05, Lee Passey <lee@novomail.net> wrote:
And this is the problem with PDF. Having a PDF file does not necessarily mean you will get a nice looking printout, it only means you will get a printout that looks like what the document author wanted you to have.
It gives me a better printout than anything else I've seen can.
If you like TEI+XSL ==> PDF ==> print I'd bet you'd like TEI+XSL ==> LaTeX ==> print even better. Perhaps someone will take up the challenge and work on an XSL script for LaTeX. And of course just creating a PDF file in and of itself is not a solution to creating good print output. I've seen PDF files that use the Courier type and 66 character lines, which look almost like typewriter output. Wordpad can produce output better than that. If your preferred end product is print, I'd bet there is a TEI+XSL ==> print solution that would be at least as good, if not better, and would avoid the intermediate PDF phase. On the other hand, very few people have installed XSL script processors, so the advantage of the PDF solution is that you can do the first step on a remotely hosted machine, download the intermediate file, and then print that. Of course, the downside to all these solutions is that you are relying on a third party to create the XSL file, so you are forced to accept whatever stylistic choices the host is offering; this is typically not a problem if (1) the XSL is designed to conform to statistical stylistic norms, and (2) you're relatively tolerant of diveristy in styles. [snip]
And I don't think that "allow[ing] the end user to have the maximum amount of input as to the which rendering decisions are made" is a good idea. Options take time to code, most people don't care about widows and orphans, and a page full of mysterious options (and I suspect widows and orphans fall into that classification for many users) is a way to drive away many users.
Here we will just have to agree to disagree. In my mind, empowering end users is _never_ a bad thing. And while it may be true that most people won't care about widows and orphans, we know that there are at least two people who do (and frankly, if I had a choice between a product that minimized widows and orphans, and one that didn't, I would be inclined to pick the first). Should those people who _do_ care be forced to overcome steep barriers just because they are part of a minority? Now a page full of "mysterious options" can, indeed, be daunting. But the solution is not to make all users suffer the same options, but to create a selection process that is _not_ daunting. I doubt very much that Microsoft Word has lost a lot of users because it allows widow/orphan control; at the same time I doubt if most MSWord users even know that widow/orphan control is possible. That's because it is hidden behind a menu tree that says, in essence, "power users click here; novices should avoid this." I certainly wouldn't suggest that an end user should have to run a gauntlet of choices before he or she can download a file, but I _would_ suggest that s/he should have the option. And I don't think that a page of check boxes is the best solution even then. When I suggested that CSS could solve many of these problems, some have responded that CSS is far to complex to expect an end user to edit a CSS file to change an option. I agree with the statement, I just don't agree with the implied conclusion. I would foresee the typical end user downloading a TEI text, and then downloading a number of CSS files which all claim to be the ultimate expression of sylistic perfection. The end user could try out all of the CSS files by the simple expedient of renaming a candidate file to the standard name (e.g. pgtei.css) and then looking at the output. He or she would then simply stick with the one that reflects his or her preferences the best. Only if the end user could not find an existing CSS file adequate to his/her needs would there be any requirement to tweak a CSS file. And if every PGTEI file contained a stylesheet declaration that included the standard file name, once the decision was made it would never have to be revisted, and no editing of source files would ever be required. Of course, the same sort of solution would work with XSLT, but that would require that every end user have an XSLT processor, and that every file downloaded would have to be converted before use. The way to solve the complexity problem for unsophisticated users is to provide intelligent defaults, not to take away freedom of choice.

Personally, I find monospaced, serifed fonts to be the easiest to read, and am frequently frustrated by the lack of books that use a monospaced font, and I wonder who is to blame for it.
In all of the software usability testing that I've done over the years (mostly involving web applications), monospaced fonts consistently score the highest for readability and comfort with the users.
(Also, tables etc in plain text need a monospaced font to line up properly.) I would bet that the reason publishers like condensed text is the same reason they indicate new paragraphs by indenting the first line rather than separating adjacent pars with a blank line; the same reason they will break a long line of metrical drama and append the tail to the end of the line above or below; the same reason they will publish a long narrow list as a two column table even though there's no connection between the left and right halves of the content on any one row: paper costs money, and they want to get as much use out of each square inch as possible while keeping the text more-or-less readable. They are largely happy to sacrifice some readability if it means they can limit the amount of paper they have to use. Of course, in etexts we don't have that problem. Screen space (especially vertical screen space) is as good as free. Using a blank line makes it easier to see where a new paragraph starts? Fine, we'll do that. Rejoining metrically split lines? We'll do that too. And since in plain text files it's trivial to choose, I always read PG texts in a monospaced font. Preserving the financially-induced presentation limitations of publishers and printers working with paper in etexts seems a waste of effort and introduces an unecessary loss of readability, in my opinion. People who care about the exact typography or detailed layout etc will one day be able to go look at the scans - people who just want to read/search/edit/quote the actual textual content don't need all the bells and whistles, right? :) Cheers! Bill

Personally, I find monospaced, serifed fonts to be the easiest to read, and am frequently frustrated by the lack of books that use a monospaced font, and I wonder who is to blame for it.
In all of the software usability testing that I've done over the years (mostly involving web applications), monospaced fonts consistently score the highest for readability and comfort with the users.
Actually, I'm fairly certain that a majority of people find proportionally-spaced fonts easier to read for paragraphs of text (though not for source code, for example). That's true for print, and I doubt that it's different on the Web. Alas, I couldn't find a good reference via a quick Google search. PLEASE NOTE that I completely respect that some people fine monospaced fonts easier, I just think they (and you) are in the minority.
I would bet that the reason publishers like condensed text is the same reason they indicate new paragraphs by indenting the first line rather than separating adjacent pars with a blank line; the same reason they will break a long line of metrical drama and append the tail to the end of the line above or below; the same reason they will publish a long narrow list as a two column table even though there's no connection between the left and right halves of the content on any one row: paper costs money, and they want to get as much use out of each square inch as possible while keeping the text more-or-less readable. They are largely happy to sacrifice some readability if it means they can limit the amount of paper they have to use.
I completely disagree. There's a huge variety of books (and magazines) with vastly different print sizes and densities, and aimed at different markets. If monospace were easier for the majority of people, some (probably most) publishers would use it -- at the very least in specialized niches. To cite an extreme example: children's picturebooks have plenty of room on the page and larger print; why aren't they monospaced? How about large print editions? Expensive journals? Or, look at books that specifically choose more expensive paper to make an impression or cater to an audience that appreciates it. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting

And since in plain text files it's trivial to choose Yep! And even with HTML, if one uses a light hand with the CSS,
--- bkeir@pgdp.net wrote: the browser preferences will come through happily. A good reason to specify as little styling as possible. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Bowerbird@aol.com wrote:
here's that alice pdf i mentioned on friday. it's at:
http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf
it doesn't represent a finished version, not quite yet, but i posted it because it demonstrates some issues.
Nevertheless, it really looks quite good. Almost completely usable.
the biggest reason for the dense typography -- and the rationale behind the small leading -- is because the lines are of very long length. the file has a line-length that is rather typical of the e-texts found in project gutenberg. (the longest line in the original alice30.txt is 77 characters; 33 lines are 66 characters long or longer. that's too long.)
I'm confused: why can't you simply re-flow the paragraphs to get whatever line length makes sense? Reading later in your message, I see that this is precisely what you propose. So why worry about line length in the text version? [snip]
instead of breaking lines according to character-count, we should break them according to _string-width_ in a _proportional_font_.
although different fonts can vary greatly in terms of their width, the bulk of reading comprehension studies indicate that a length around 50 characters is optimal, so that's what we should aim at. (to equate the figures, 50 12-point characters is about 300 pixels.)
This, in particular seems quite surreal: why would I wrap my fixed-width text *as if* it were proportional? And what font metrics should I use? [snip]
one reason i'm showing you this output with their original hard-wrapped lines is so that when i later show you output with the "default" soft-wrapped lines, you can compare them. and then when i show you output with my "adjusted" linebreaks, you'll be able to compare those results to the other two instances.
I know from many previous discussions that a general solution to re-flowing un-marked-up text like we see in PG is a hard problem. I do fully agree that some very simple conventions can make it quite a tractable problem, without forcing us to use full-on markup. I'm looking forward to seeing what you've come up with.
(it's not that i expect anyone here to actually care that much, or even at all; but if there _does_ happen to be someone who does, i'll satisfy their curiosity.
I care. I'd care more if all this stuff was a SourceForge project that I could poke at and tinker with and re-format my own samples. But I still care, even if it's your private project. [snip]
the issue on which i would most like feedback is how to handle the illustrations. [snip]
Don't re-size. Have them float to a suitable location. If [Giant Alice watching Rabbit run away.] on p13 was an illustration, you're formatter mishandled it. Also [The Cheshire Cat fades to a smile.] p59 and [Executioner argues with King about cutting off Cheshire Cat's head.] p82. There's something wrong with the text at the end of the line with "'Rule Forty-two. All" p114. Probably a failure on the part of the contributor to abide by the "50 12-point characters is about 300 pixels" rule. ;-) ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today.

Bowerbird@aol.com wrote:
with the font i used -- georgia, a font that was designed
You *did* get a license to distribute that font before posting that stuff to snowy. Did you? Otherwise I should strongly advise to use the Times font, which is not embedded into the pdf because it is built into Acrobat Reader (and coincidentally makes your pdf smaller). To get a license for Georgia, see: http://www.ascendercorp.com/msfonts/georgia_family.html -- Marcello Perathoner webmaster@gutenberg.org

Marcello Perathoner wrote:
You *did* get a license to distribute that font before posting that stuff to snowy. Did you?
Merely embedding a font in a document is not distributing it -- legally speaking. The specific permission bits on the font are baked into the font, and I bet you dollars to doughnuts it, in this case, allows non-editable-embedding and maybe editable-embedding but not installable-embedding -- which is what you're thinking of. The actual Georgia fonts in BB's document are non-editable ie: they are embedded in a document that is not intended to be further edited (using that font) -- and in fact they have been subset, making them fairly useless even for someone who went to the trouble of teasing them out of the binary goo that is in a PDF document. The copyright police will not be breaking down BB's door on that account. ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today.

Gardner Buchanan wrote:
The actual Georgia fonts in BB's document are non-editable ie: they are embedded in a document that is not intended to be further edited (using that font) -- and in fact they have been subset, making them fairly useless even for someone who went to the trouble of teasing them out of the binary goo that is in a PDF document.
First, you have to make sure the font vendor allows subset embedding of the font. Did you? Second, you must buy a license of the font. If you don't own a license you surely have no right to use this font in any way. And, even if Microsoft got a license to distribute this font from their web site, this doesn't give BB a license to do the same. -- Marcello Perathoner webmaster@gutenberg.org

I think you are mixing the ambitions of the font vendors with the state of the law. I make no claims to be a copyright specialist but as I understand it even in the United States fonts /per se /fonts are not copyright viz: /The house of representatives report that accompanied the new copyright law when passed in 1976: "The Committee has considered, but chosen to defer, the possibility of protecting the design of typefaces. A 'typeface' can be defined as a set of letters, numbers, or other symbolic characters, whose forms are related by repeating design elements consistently applied in a notational system and are intended to be embodied in articles whose intrinsic utilitarian function is for use in composing text or other cognizable combinations of characters. The Committee does not regard the design of typeface, as thus defined, to be a copyrightable 'pictorial, graphic, or sculptural work' within the meaning of this bill and the application of the dividing line in section 101." H.R. Rep. No. 94-1476, 94th Congress, 2d Session at 55 (1976), reprinted in1978 U.S. Cong. and Admin. News 5659, 5668./ As I understand it this remains the case. Adobe have tried to suggest some fonts are in fact computer programs but this still does not protect the glyphs or documents containing them. I do not think there is any law anywhere in the world that restricts the unlimited distribution of pdf files containing any font that the generating program inserted. As an afterword - we in polite society do not imply in a public forum that somebody is doing something improper without firm proof. regards Lynne Marcello Perathoner wrote:
Gardner Buchanan wrote:
The actual Georgia fonts in BB's document are non-editable ie: they are embedded in a document that is not intended to be further edited (using that font) -- and in fact they have been subset, making them fairly useless even for someone who went to the trouble of teasing them out of the binary goo that is in a PDF document.
First, you have to make sure the font vendor allows subset embedding of the font. Did you?
Second, you must buy a license of the font. If you don't own a license you surely have no right to use this font in any way.
And, even if Microsoft got a license to distribute this font from their web site, this doesn't give BB a license to do the same.

On 9/27/05, Lynne Anne Rhodes <oxbow@spiritbase.net> wrote:
As I understand it this remains the case. Adobe have tried to suggest some fonts are in fact computer programs but this still does not protect the glyphs or documents containing them.
I do not think there is any law anywhere in the world that restricts the unlimited distribution of pdf files containing any font that the generating program inserted.
Fonts are computer programs, and have been protected as such in courts of law. If you embed these fonts in a PDF, they are still copyrighted and are legal to copy only if the copyright holder gave rights to copy. There is no blanket right to embed fonts in PDF files and distribute them. Marcello, however, is at least partially wrong; licenses of these fonts come with Windows, so there's no need to buy a license, and I suspect that they are embeddable; Microsoft paid a lot for their users to make wide use of these fonts, and went so far as to offer them to non-Windows users at one point. As much to the point, typefaces, not just font files but typefaces, are protected in much of the world. Your broad assumption that there is "no law anywhere in the world" is wrong. "even in the United States fonts per se fonts are not copyright viz:" is completely wrong; it is the US that is the exception at not covering them with copyright.

David Starner wrote:
Marcello, however, is at least partially wrong; licenses of these fonts come with Windows, so there's no need to buy a license, and I suspect that they are embeddable; Microsoft paid a lot for their users to make wide use of these fonts, and went so far as to offer them to non-Windows users at one point.
True-type fonts have the embedding permissions set into the font file. As I write this I'm looking at the properties of Georgia v. 2.12, as it was distributed with my install of Windows XP. The embedding permissions say, "Installable embedding allowed; font may be embedded in documents and permanently installed on the remote system." This is the most liberal permission set that may be coded into a True-type font. -- RS

Robert Shimmin wrote:
True-type fonts have the embedding permissions set into the font file. As I write this I'm looking at the properties of Georgia v. 2.12, as it was distributed with my install of Windows XP. The embedding permissions say, "Installable embedding allowed; font may be embedded in documents and permanently installed on the remote system."
BB is on a Mac and using the Mac Georgia font. I don't know if Apple negotiated the same conditions for their user with Microsoft. Now if BB could be bothered to check ... About developer licensing of MS fonts: http://www.ascendercorp.com/msfonts/msfonts_main.html -- Marcello Perathoner webmaster@gutenberg.org

Lynne Anne Rhodes wrote:
I do not think there is any law anywhere in the world that restricts the unlimited distribution of pdf files containing any font that the generating program inserted.
Fonts are copyright-protected nearly all over the world and even if you cannot copyright a "typeface" in the US you can copyright the implementation of the typeface on a computer system. That's the reason Microsoft did not just incorporate the Adobe "Helvetica" font in Windows, but made a tracing of that font called "Arial". See: http://en.wikipedia.org/wiki/Typeface#Legal_aspects_of_typefaces Embedding (the whole or a part) a font in a pdf file and distributing the pdf is legally the same as distributing the font (or parts of it). You can easily recover embedded fonts from pdfs with appropriate tools that ignore the "copy bits". Some companies (not all) allow to embed their fonts in documents, but you must buy a license for the font first. Now BB writes on the front of his book "this book is in the public domain", but it contains copyrighted material. He will get himself in trouble (which I don't give a damn) but he will also get in trouble those people who further distribute his "public domain" book. I don't believe he will financially backup the lawsuits those people may get into because of his carelessness.
As an afterword - we in polite society do not imply in a public forum that somebody is doing something improper without firm proof.
The proof is there for everybody to download. Open the file in Acrobat and type "Ctrl+Alt+f". Its better to warn people than to get them in trouble with bogus legal advice. -- Marcello Perathoner webmaster@gutenberg.org

Sorry to keep on but I feel there is still confusion. Please correct me if I am wrong but my understanding of this issue is: 1. Glyphs are not copyrightable anywhere. Plenty of books abound with full sets of typeface glyphs with no hint of any copyright rules. Of course the book itself as a work bears it own copyright. 2. Collections of glyphs may be given a name which can be and often is a trademark. Thus type with terms like Helvetica, Univers etc become born again as Swiss, Humanist etc. when people copy the glyphs and do not wish to pay the original trademark owner. On the other hand common names such as Times may be used to describe collections of glyphs which are radically different, very few bearing any resemblance to Samuel Morrison's masterpiece. BTW if Arial is meant to be a copy of Helvertica its a shame they didn't copy the glyphs more faithfully! 3. Bitmaps which represent glyphs are no more copyrightable than the glyphs themselves although they may also be trademarked. 4. Many outline fonts are no more than a series of dimensions describing the glyph. This is just data derived from the glyphs maybe with some modicum of additional data such as kerning or spacing data. The old time punch cutters did nothing different. With due deference to the legal profession I can see no justification for calling this set of data a program as it adds nothing to the information contained in the basic glyph. The font file has no meaning standalone and has no input/output mechanism. It is not even a set of instructions for doing something. I think that even postscript fonts stretch the the term program too far as although they contain elements of a programming language they are still a set of data points. More complex multidimensional fonts perhaps maybe deserve the accolade. 5. The key component is a rendering engine which translates the font data into some recognisable form such as a printing press, printer or computer program. This is clearly a product requiring significant intellectual acumen and fully deserves full IPR protection. This is key to obtaining a high quality document. An electronic font file may contain instructions which inhibit the ability of the rendering software in some way but this has nothing to do with copyright. 6. Finally we have the output document whether printed or electronic. I believe it to be quite irrelevant as to whether the glyphs can be extracted from the document by electronic means or by scaling and tracing the printed page. Just extracting the glyphs does not take anything which has recognized intellectual property rights; not even the basic spacing, kerning and so forth can be determined. Nor for that matter can the mechanism used by the rendering engine to produce the document. Distributing a font with an electronic document is no different in principle that distributing it with a printed pages. The right to use a glyph collection with a trademarked name of course depends upon the owner of the trademark having given her permission and for this payment may have to be made. But if the user chooses to spend the time assembling a set of glyphs from that same family together under a different name there is nothing to stop this and no law is breached. Lynne. Marcello Perathoner wrote:
Lynne Anne Rhodes wrote:
I do not think there is any law anywhere in the world that restricts the unlimited distribution of pdf files containing any font that the generating program inserted.
Fonts are copyright-protected nearly all over the world and even if you cannot copyright a "typeface" in the US you can copyright the implementation of the typeface on a computer system.
That's the reason Microsoft did not just incorporate the Adobe "Helvetica" font in Windows, but made a tracing of that font called "Arial".
See:
http://en.wikipedia.org/wiki/Typeface#Legal_aspects_of_typefaces
Embedding (the whole or a part) a font in a pdf file and distributing the pdf is legally the same as distributing the font (or parts of it). You can easily recover embedded fonts from pdfs with appropriate tools that ignore the "copy bits". Some companies (not all) allow to embed their fonts in documents, but you must buy a license for the font first.
Now BB writes on the front of his book "this book is in the public domain", but it contains copyrighted material.
He will get himself in trouble (which I don't give a damn) but he will also get in trouble those people who further distribute his "public domain" book. I don't believe he will financially backup the lawsuits those people may get into because of his carelessness.
As an afterword - we in polite society do not imply in a public forum that somebody is doing something improper without firm proof.
The proof is there for everybody to download. Open the file in Acrobat and type "Ctrl+Alt+f".
Its better to warn people than to get them in trouble with bogus legal advice.

On 9/28/05, Lynne Anne Rhodes <oxbow@spiritbase.net> wrote:
4. Many outline fonts are no more than a series of dimensions describing the glyph. This is just data derived from the glyphs maybe with some modicum of additional data such as kerning or spacing data. The old time punch cutters did nothing different. With due deference to the legal profession I can see no justification for calling this set of data a program as it adds nothing to the information contained in the basic glyph.
Your opinion on the matter, with all due respect, is not going to influence a court of law. If you want to convince us, cite court cases.
participants (14)
-
bkeir@pgdp.net
-
Bowerbird@aol.com
-
David Starner
-
Gardner Buchanan
-
Geoff Horton
-
joey
-
Jon Niehof
-
Jon Noring
-
Keith J.Schultz
-
Lee Passey
-
Lynne Anne Rhodes
-
Marcello Perathoner
-
Robert Shimmin
-
Scott Lawton