joey said:
> Personally, I find monospaced, serifed fonts to be
> the easiest to read, and am frequently frustrated
> by the lack of books that use a monospaced font
gee, joey, you're in a tiny minority with that position...
> and I wonder who is to blame for it.
that's a funny way of putting it... :+)
but nonetheless, as the saying (which is in alice) goes,
"no accounting for tastes!"
one of the nice things about e-books is that you can
choose whichever font -- and whatever size -- you want.
using my viewer-app, you'd then just make your own .pdf.
(assuming you want a .pdf, and don't want to just view it
right there in my viewer-program. but who needs pdf?)
you can also adjust the leading, the background color,
the text-color, whether the text is perfect-justified or not,
whether the pages are bottom-balanced or not, and so on.
you can adjust the pagesize of the pdf to that of a p.d.a.,
if you want, so your .pdf works on a carry-around machine.
i'd assume that all of these adjustments are ones that
the "official" p.g. approach, based on a t.e.i. master, will
also allow end-users to make, if not soon, then eventually.
if not, people will be using my solution instead of yours...
bottom-line, although the findings on reading comprehension
can be interesting, especially if you dig typography, they are
based on an archaic base that there is one way that's "best".
when you have to commit ink to paper for a large press-run,
that mode of thinking makes sense, since you want to please
the majority of people out there. but that was _then_, folks.
now is now. we live in a different world now, a one-off world,
a display-on-demand world where you can get it your own way.
who cares if you're the only darn person in the world who likes
monospaced fonts? if that's what you like, you can have 'em!
you like your lines triple-spaced? fine. 48-point text? no prob.
you want green letters on a black-background, just like we had
way back in the original days of monochrome monitors, go for it!
tell the people who want to cram you into a mold to cram it up...
> Do you have some pointers to research that
> might explain why the prevailing wisdom is that
> monospaced fonts are "bad"?
i don't know if you'd ever find such research, joey,
since i'd think "the prevailing wisdom" is what it is
because most people think monospaced fonts look ugly,
and so obviously so that it'd be ridiculous to research it.
there's a great scene in "revenge of the nerds", i think,
where steve jobs used the monospaced nature of dos as
an example of how bill gates has no sense of esthetics...
but hey, like i say, you can pick whatever font you want,
and that is without question a good thing...
***
as a side note, i found that, whenever i came up against
a hard choice in programming my viewer-app, i would say
"i'll give the user an option to do it however they want".
originally, it was a way to avoid making the decision myself.
besides, that's the way i usually design my software anyway.
the only reason i do programming at all is because i don't like
being locked into a specific set of perspectives and constraints.
but over time this also came to represent a philosophical stand
-- a stand that says that the user gets to decide formatting --
and i think that stand served me well here, because it forced me
to rip a lot of the decision-making out of the hands of authors...
as an author, you often want to exercise a lot of control over
how your baby looks. you wanna control the size of the text,
and the font used, and the margins, and any number of things.
you want to use things like varying text-size as a design tool...
but in our new world, all those decisions belong to readers,
not to authors. you can put your text into a design format,
but if users don't like that format, they will rip out the text
in order to pour it into the format they prefer that it be in...
as a quick example, i'll often copy text out of a web-page
to read it in a wordprocessor, where i can format it at will.
indeed, the example of loading the text into a wordprocessor
was the guide that i used in the architecting of my program.
(easy enough to imagine, since that's how i read p.g. e-texts.)
if using a wordprocessor allowed me to control some variable
-- be it leading or text-color or whatever -- then i considered
that that variable was one that i had to let _my_ user control.
think you can lock up your text in the format you choose?
think again. tomorrow's o.c.r. technology will grab text
from anywhere, and you can't do a darn thing about it.
there was a report that japanese commuters are using
their cell-phone cameras to click pictures of newspapers
that they can read on the crowded trains. that might be
just a story -- i think it came from a company offering
o.c.r. in their cell-phone -- but even if it's not true today,
it will certainly come true sometime down the line...
as long as you're going to make your text _visible_
-- and invisible text isn't going to take you very far --
give up the idea that you can control that text at all,
let alone restrict its presentation to your favored format.
it's the same surrender that graphic-designers who were
accustomed to paper had to make migrating to the web.
in the end, i found it very liberating to say "sorry, charlie,
but that's a user option over which you have no control."
(i added this section as the last part of writing this message,
but as you'll see below, i'm constantly saying "user option".
had i written all this first, i would've just said "ditto" below.)
***
marcello said:
> bb's pdf isn't justified at all.
> It is simply lines broken using character count.
> That will do for fixed fonts but never for proportional ones.
i explained exactly why i did this -- to illustrate the problem.
one of the next examples of this .pdf will have justified text.
ultimately, whether lines are justified should be a user option.
> long words need to be hyphenated to make interword spacing more even.
nope, you're wrong. hyphenation is an artifact of paper-books.
it should be left behind. it interferes badly with search routines.
and -- as coming examples will demonstrate -- it's unnecessary.
you can get tight-enough paragraphs without using hyphenation...
> should use typografic dashes instead of --
no, that's one of the options you should give to the end-user.
along with curly-quotes. most people might agree with you
that the typographic versions look better (i know that i do),
but -- just like joey and his unorthodox preference in fonts --
there's no reason each end-user can't have it like they want.
> many laser printers will not print to the very edge of the page,
> thus bb's lines will be cut off.
well, since the page-size i used was 5*8 (or was it 5*7?),
you'll find this .pdf will print comfortably on any printer.
in fact, you should be able to print it as a 2-page spread,
which is absolutely how i recommend that it be printed...
but i'm glad you mentioned pagesize, since i thought that
your selection of pagesize in your .pdf was a bit strange...
to accommodate common pagesizes in america and europe,
a print-area of about 5*8 inches is probably optimal, since
it will fit in both directions on the pages of both continents.
(and, as i noted above, it lets people print it in 2-up fashion;
if people insist on killing trees in spite of having an e-book,
then let's at least see that they kill as few as possible, eh?)
but here again, there's little use in discussing this much.
pagesize should be an option that's given to the end-user.
as i indicated above, there might be a good reason why
a person might want a pagesize as small as 2*3 inches...
> incompatible page blackness (16-17, 22-23, 26-27)
> left and right-hand pages have different amounts of leading.
> This looks ugly.
i think you're probably talking about the pages with illustrations,
or perhaps pages at the end of chapters. in both of those cases,
i'll be revisiting that issue after i decide how to handle the images.
> footnotes
> should (as the name implies) show at the foot
> of the same page, not in the appendix
my program will let the user decide that. your system should too.
> picture size (39, 40)
> pictures should float to the next page
> if not enough room is left on this page,
> not be resized to poststamp size
that's the main issue that i wanted people to give feedback on.
so thanks for weighing in with your opinion.
> missing pictures (23)
> all pictures should make it into the pdf
good catch. i'll have to find the bug that dropped those images.
> table of contents
> should have page numbers
i agree. one of the upcoming versions will have that...
and the table of contents should be hotlinked too.
that's an automatic feature in my viewer-program,
and i've worked it into the .html version as well, but
i'm gonna have to do a little work to get it in the .pdf.
hey hey, folks, give credit to marcello for one constructive post!
maybe he didn't _intend_ to be constructive, not fully anyway,
but that's beside the point. as long as your arguments are
_accurate_ and _on-topic_, then criticism _is_ constructive!
***
gardner said:
> I'm confused: why can't you simply re-flow the paragraphs
> to get whatever line length makes sense? Reading later
> in your message, I see that this is precisely what you propose.
well yes, reflow is the solution, of course. but read on...
> So why worry about line length in the text version
ok, i should have explained that. it requires a backtrack.
when i first started programming my viewer, years ago,
i told michael that one of the things it could do would be
reflow the paragraphs. i thought he'd be happy about it.
i was a little surprised when he said that he would prefer
that the linebreaks be left as-is. when i suggested that
it could be a user-option, that was a good compromise.
i figured that everyone would choose to use the reflow,
and there was no good reason to make michael unhappy,
so i wrote the program in a way that, at least on the initial
load-in of an e-text, it would respect the existing linebreaks.
what i discovered, after several months of working this way,
was that i _preferred_ when the existing linebreaks were used.
i figured it was just because i am a linebreak freak myself,
and didn't think about it much. but after even more months,
i came to realize hard-wrapped lines have _lots_ of benefits...
after literally years now of examining this issue very closely,
i've decided that it's definitely superior to hard-wrap the lines.
i don't expect you to initially agree with the conclusion i reached.
i too doubted it at the outset. but there are reasons it's correct.
it's _much_ easier to keep your "fix" on the text when it's
not being rewrapped every time you resize the window or
resize the text, so one benefit is sheer improved readability.
and repagination goes much faster when the program can
just do repagination, without also doing remargination first.
(this isn't a problem on newish computers, of course, but
lethargy is an issue on handhelds, and on old computers.)
another reason -- important when you're editing the text --
is that hard-wrapped lines really make that process easier.
error-correction, for instance, is a breeze when you know
the line in the file will be the exact one you are looking at.
otherwise, you are trying to track down _phrases_, which
is usually much more susceptible to troubles with duplicates.
this might not seem like a big deal, but i can assure you that
-- in my experience -- it indeed makes a world of difference...
for instance, i have been able to write error-correction apps
that are far simpler to architect, program, and understand,
because they can be line-based rather than phrase-based...
one nightmare i had with some project gutenberg e-texts
was that the .html file wrapped different than the .txt file.
so not only did i have to reprogram around the .html code,
which is bad enough, i had to piece together the lines too.
(and when i remarked on the stupidity of this .html rewrap
to greg, i was floored that he didn't even grasp the problem.
how can you bang head against wall, and not know it hurts?
i guess eventually they did figure out that it made their life
unnecessarily difficult for them, since they stopped doing it.)
and there's a whole other set of benefits for hard-wrapping.
to show this, i uploaded a copy of the alice .html file that
uses [br] commands to give us the exact same linebreaks
that are present in the .zml file _and_ in the .pdf file as well.
if you open all three of these versions simultaneously in
side-by-side-by-side browser-windows, you can clearly see
the true parallel nature of these three "different" versions.
it's a good lesson in learning how a converter would work.
additionally, it shows us the utility of using the .txt version
not just as a "master", but as the _only_ version we keep,
since the other versions can be so easily derived from it,
using a tool that's right down on the user's own machine.
so if the .txt versions is wrapped with a proportional metric,
and the other versions are made matching those linebreaks,
everyone is looking at "the same file", and -- perhaps even
more importantly -- everyone _knows_ that everyone is...
now of course, the end-user can also rewrap lines at will,
with all the options set according to their own preferences,
if what they're doing is using the file for their own purposes.
but if they _want_ to, they can -- at any time -- reproduce
the version that project gutenberg considers as "the master".
that "mutally-understood perspective known-to-be-identical"
can be a tremendously useful construct to have on your side.
all of this will probably make you understand more clearly now
why i have often lobbied so hard to retain the original linebreaks
from the paper-book all the way from the proofing to end-users...
(remember joey complaining about my "6-page e-mail" on that?)
there's a very good reason why i bring up the issue of doing proofing.
anybody who has proofed will tell you that it would be sheer stupidity
to give up the congruence of the linebreaks between scan and o.c.r.
and so if we work distributed proofreaders into the big picture here
-- or, more generally, my idea about "continuous proofreading" by
the user-base at large in our continuing march to text-perfection --
you can see that if we maintain the linebreaks (and pagebreaks)
from printed-page through to our "mutually-understood" master,
we will have a very powerful shared-perspective on which to stand.
we won't know how important that might be until we actually try it.
perhaps it won't make that much difference in the long run, i dunno.
but i believe it could make a huge difference. and we should try it...
so, in sum, line-length is important because i have found
that it is wise -- a lot of the time -- to use the lines just
as they were hardwrapped. i'll explain even more below... :+)
> This, in particular seems quite surreal:
working on alice can do that to you... ;+)
> why would I wrap my fixed-width text
> *as if* it were proportional?
> And what font metrics should I use?
well, first of all, the text is only "a fixed-width text"
if you _decide_ that you _want_ it to be fixed-width.
if you view it with a proportional font, it's proportional.
indeed, if you remember that it came out of a p-book,
its identity really _is_ as a proportionally-spaced entity.
but as i said, it's just text, and can be defined any way.
so let me rephrase your question.
why would i _wrap_ my text *as*if* it were monospaced,
when i'm then going to _view_ it with a proportional font?
because when i do that, i get some weirdish line-lengths...
what i'm saying is that since most people will view it with a
proportional font -- and that's how we _want_ 'em to view it,
except for joey -- we should wrap it with proportional metrics.
now, you ask a smart question -- the _right_ question --
when you ask "well then, which metric should we use?",
because the metrics do change from one font to the next.
but the smart question has an even smarter answer,
in that -- as long as you wrap to about 50 characters --
any proportional font will do. that's because the general
nature of the metrics of letters themselves is such that
there is a very high correlation between various metrics.
(that is, "w" and "m" will be wide letters in any metric.)
even a "narrow" font -- if you wrap to about 50 characters --
will give you lines that look ok in any other proportional font...
and the 50-character rule (i hope it really _is_ 50 characters,
or thereabouts, since i pretty much pulled that out of my ass),
applies to different font-sizes as well. that's why they can
use _the_very_same_pagination_ for books (like a bible or
a dictionary) that are printed in pagesizes ranging all the way
from _pocketsize_ up to 8.5*11. the size of the text will get
bigger as the page is bigger (or you can say it the other way,
with the paper-size getting bigger as the text gets bigger),
but the number of characters in the line will stay the same!
and it doesn't feel weird in the slightest bit, since you simply
_expect_ that the pocket-sized book will have smaller text.
(heck, if its text were the same size as in its bigger brothers,
each line would seem "too short" to you, since it would not
have enough characters to make it seem like it was "full".)
it's the same with that monitor test that i suggested earlier.
you can do that test with your monitor set at 1024*768,
and then reset the monitor to 800*600, or even 648*480,
and the results will stay the same. the most readable line
with good size, and good length, will use up half your screen.
or take a piece of 8.5*11 paper and hold it up to your monitor --
landscape-wise -- and you will likely find it fits fairly comfortably.
now take that piece of paper and fold it in half, chapbook-style,
and you'll see it is about the size of an airport-rack paperback,
which -- if we had to pick a p-book's "form-factor" -- that's it...
so my conclusion is that lines should be wrapped to that size...
> I know from many previous discussions that
> a general solution to re-flowing un-marked-up text
> like we see in PG is a hard problem.
actually, when you get right down to it, it's not... :+)
> I do fully agree that some very simple conventions
> can make it quite a tractable problem,
> without forcing us to use full-on markup.
and that's the answer as to why it's not a hard problem.
> I'm looking forward to seeing what you've come up with.
well, as usual, i didn't have to "come up with" any solution myself,
i just had to leverage one that the masses had already developed...
the solution is to use at least one leading space to turn off a line's rewrap.
that is the rule that tidy -- for example -- uses, and other tools as well...
this easy-to-understand-and-easy-to-implement strategy works very well.
and it's also easy for programmers to write routines that respect the rule.
from my perspective, that makes it a winning solution all the way around...
my viewer-program has a few additional rules as well:
1. it respects a leading-tab the same as a leading-space.
2. it respects "bullet" characters, like asterisks, as well.
3. it respects numbered lists, like this one you're reading.
4. i think there's other ones, but can't remember 'em now.
even in these cases, though, it's easy enough to say
"use a leading-space on those too, just to make sure".
indeed, i've even taken to using a leading-space on my
header-lines, which in z.m.l. must be preceded by several
blank lines, so they wouldn't have been wrapped anyway...
i do this because one of the routines that's in my app
pulls out any lines that have a leading-space on them...
this routine is useful because it shows you _structures_
that might be of interest, like block-quotes, poems. etc.
so in order to have header-lines and list-items located by
this routine as well, i just put the leading-space on them...
there are other useful ramifications from the leading-spaces.
you might remember a while back when there was a huge
(tempest in a teapot) controversy about how was my app
going to do paragraph indentation correctly since i defined
"paragraphs" so haphazardly (i.e., surrounded by blank lines).
one example that was mentioned involved block-quotations
being improperly indented in such a case. as you probably
can now see clearly, it's very easy for me to identify them.
in a nutshell, don't indent a "paragraph" that's already indented.
i'm sure a few of you figured it out. good for you, i guess, but
it ain't rocket-science... or brain-surgery... or even plumbing...
poetry also qualifies, as do all of the other structures that
you might "worry" about, since they all have leading spaces.
further, even something like lines on a table-of-contents page
are easily moved out of the harm's way of improper wrapping
simply by prefacing each with a leading-space. that's better
than using an empty line to separate each of them, because
typically you want those lines to form a small and tight page,
rather than being strung over two pages or -- even worse --
more, since then they cannot be all viewed at the same time.
most of you probably didn't even notice that those lines in
the table-of-contents had leading spaces in the ascii version,
did you? there's probably much such cleverness you missed...
varying indents? just use different numbers of leading spaces.
even the "mouse-tail" in alice is dumbfoundingly easy to make.
(i don't know about you, but the kludginess of the x.m.l. method
for getting a simple indent on a line is enough to make me barf.)
now i know someone is gonna say "line indentations like that
are going to cause problems when you go to a small screen".
(i even have a good idea who it might be who would say that.)
but obviously they haven't spent any time trying to think of
a _solution_ to the problem, their brain just froze up at the
realization of the problem, and didn't do any work after that.
because the solution is pretty obvious, and easy to figure out,
once you remember that you are writing the viewer-app, and
not depending on some brain-dead browser to display the text.
when an indented (i.e., non-wrappable) line gets too long to be
able to be displayed with the current line-width, you _chop_ it.
well, first you try to "squeeze" it, to see if a little of that works.
but assuming that that won't be sufficient, you'll have to chop.
that is, you trim the leading-spaces, then segment the line into
the right number of pieces, and then restore the leading-spaces
to each piece. it would be tedious to do, if you did it by hand,
but it's the very kind of thing that computers are very good at.
chopping has worked well in the situations where i've tested it.
i haven't brought the routines into the main program yet, so
it hasn't yet been subjected to the hard testing it'll get there,
but i'm pretty confident that it will stand up well.
and the leading-space practice has proven its worth already...
this is another one of those things i've looked at for years now,
and i'm convinced it's a good solution. i have seen hairy cases,
not that i can remember any of them now, and i am sure that
i will see more. but i've seen nothing so far that it can't handle.
> Don't re-size. Have them float to a suitable location.
that's a bit vague. where is "a suitable location"?
if the compressed version is a hotlink, then _any_
location isn't that far away, especially if there is
a backlink from the full version back to the page...
(and, for print purposes, i would print a page-number,
so the reader could easily flip to that page to see the
full-size version. page-numbers are paper's "links"!)
but i'll take your use of the word "float" to mean
that you think a suitable location is the next page.
so that's two votes (out of two) for the next page.
> If [Giant Alice watching Rabbit run away.]
> on p13 was an illustration, you're formatter mishandled it.
> Also [The Cheshire Cat fades to a smile.] p59 and
> [Executioner argues with King about cutting off Cheshire Cat's head.] p82.
thanks for the bug report.
> There's something wrong with the text at the end of the line with
> "'Rule Forty-two. All" p114. Probably a failure on the part of the
> contributor to abide by the "50 12-point characters is about 300 pixels" rule. ;-)
yep. :+)
the cause of that glitch is that i closed the hyphen-broken line,
but then forgot to put in a hard-wrap _after_ the "forty-two".
i had caught that one, but i still appreciate the error-report,
as it indicates a close eye was going over the document... :+)
***
so now, gardner, are you ready to say "uncle" on linebreaks?
because i can keep goin' if you want... ;+)
-bowerbird
p.s. i'll address your comment about sourceforge in a future message...