
Like many I'm sure (all right, I'm not really sure), I like ebooks/etexts but do not like to read them on a computer screen. This is largely, no doubt, because I work with a computer all day anyway--a book should be a place to get away from it all for a little. The natural thing to do is to print the text out and read it. The question then is: how do we typeset it? The first thing I looked at was GutenMark. I was a little disappointed when I tried it on _Gods and Fighting Men_ by Lady Augusta Gregory. The LaTeX it generated was invalid. Then I took the HTML version of said book and ran it through HTML2PS. The results were serviceable, but looked like, well, a printed web page. a2ps worked in the most rudimentary sense. The font was still a fixed width font, the paragraphs were not reformatted, so there was a lot of unused space on the right hand side of the page. One of the last two would, of course, do in a pinch, but I was wondering whether anyone else here had any ideas/recipes on how to automatically or mostly-automatically typeset a PG etext for printing. -- Michael McDermott www.mad-computer-scientist.com

On 4/16/2010 1:03 PM, Michael McDermott wrote: [snip]
One of the last two would, of course, do in a pinch, but I was wondering whether anyone else here had any ideas/recipes on how to automatically or mostly-automatically typeset a PG etext for printing.
As bowerbird is unfailingly quick to point out, automatic processing of any document file relies on the file being regularized in such a way that any transformation you which to make is unambiguously identifiable. If it is not possible to unambiguously identify a transformation you wish to make, the file must include unambiguously identifiable meta-information (information that is not part of the primary data) that identifies the transformation (this kind of meta-information is commonly known as "markup"). Project Gutenberg requires no textual regularization of any kind for its impoverished text files, and therefore these files are extremely difficult to automatically transform. Of course, there are some conventions which have evolved some of which are used more regularly than others. Thus, if you are content with the italicization of text set off by underscores (_) you will probably be successful with this transformation more than 90% of the time. On the other hand, if you want to start chapters on a new page, you will probably be successful with that transformation less than 50% of the time. The degree of success you have will depend to a large extent on the degree of transformation you want to achieve; if you are content to simply print the file as is, changing only the font face (don't try to change the font size, or you will run into reflowing problems) you will can probably achieve 99+% success. If you want to make a PG file look like an ordinary paperback, certainly less than 50%. (This is, of course, assuming you are using "off-the-shelf" tools. If you're comfortable with scripting languages you could no doubt do better). Your degree of success will also depend on the age of the PG file you want to transform. As time has gone on, and conventions have evolved, later texts are more "regular" than earlier texts; good luck converting _Pride and Prejudice_. You will probably have the most success by using the HTML version of a file, when it can be found (I do not believe that the majority of texts at Project Gutenberg are yet available in HTML versions); this is because while PG HTML texts are still not completely consistent in their use of markup, they are probably /more/ consistent than the impoverished text files. I am assuming you used html2ps or html2pdf version 2.0.43 available from http://www.tufat.com/s_html2ps_html2pdf.htm, and that you have completely read the documentation (BTW, I have not). According to the website, html2pdf almost completely supports CSS version 2, and the media parameters values of CSS3. Were it I (and it will not be, because I am completely happy reading HTML on my mobile device, and because I find PDF to be the one format which is actually worse than PG impoverished text format) I would find a css style sheet which has most of the features I like then use that with the PG HTML files and html2pdf. The resulting PDF can then be printed using Acrobat Reader or equivalent (if you are committed to the destruction of the environment). I suspect that html2pdf will not consume a style sheet unless it is referenced by the html document itself, and DP/PG has been highly resistant to the notion of adding a reference to a generic style sheet in every HTML file, so you will probably have to edit each file to add "<link href="pgstd.css" type="text/css" rel="stylesheet" />" to the <head> section of each HTML file, but I would think that would fall under the category of "semi-automated." If you cannot find an HTML version of the text you want (be sure to look outside of PG, as there are many other sources) you might want to try bowerbird's ZML2HTML coverter; I suspect it may work about 75% of the time to get basic HTML out of PG impoverished text. FWIW, the style sheet I typically use for reading HTML files can be found at http://www.ebookcooperative.com/ebook.css.

...DP/PG has been highly resistant to the notion of adding a reference to a generic style sheet in every HTML file...
Anything more than the simplest uses of CSS tends to break the conversion of HTML into EPUB and MOBI that can be successfully used by most ebook readers -- not to mention older browsers.

My history of screen experience goes back some 44 years, which is longer than we have had TV in South Africa. More than half of that period at work (and since the very early eighties home as well) was spent on screens of various qualities and functionalities, everything from 8080s and 8600s with delusions of grandeur, to large mainframes and the whole bang shoot in between, and everything from 300 bits (no, not bytes) per second (not necessarily baud) to crwth-knows-what now. My point? Apart from my decrepitude and the fact that I now have taken to wearing glasses while on line, that eyestrain never figured. I could not understand what the problem was with friends who complained of it (and there were plenty). Then a year or two after I got into PC work I realised that if I got involved in an exciting interactive game (not always if I was the player if things got really exciting), I soon got eyestrain! Now, what follows is not the remark of your friendly corner-shop ophthalmologist, and as far as I can make out my experience, while not unique is not shared by the majority of users, but I think it is of potential use to some people. In all my computer experience I have been emotionally comfortable with hardware, software, and their logic and theory of operation. Whereas many people lean forward when working at the screen, I lounge back, working with my eyes, not actually focussed on infinity (though I think it is a disgrace that our screens do not yet routinely and economically support that) but certainly focussed well past the tip of my cute little snout. In short, I am relaxed, *and so are my eyes*! But obviously I am doing something different with my eyes when playing games. The screens are the actual same screens. I usually am sitting in the same attitude, etc. so dust from the screen isn't a factor. People have suggested all sorts of things, such as that when excited my pupils are more distended or my blink rate is lower. Maybe some of those factors are true, but what it feels like to me (subjectively, I haven't been in a position to test this) is that my ciliary muscles get tired. So??? So, unless your screen or lighting is really lousy, ditto your typeface, colour, layout, size etc really unsuited to your needs, if screen fatigue is a problem, maybe what you need is some well-mamaged relaxation exercises. If what knackers your eyes is games, I am sure you can do the arithmetic! (No, don't mind ME! this is my sympathetic look! ;-) ) Cheers, Jon

The way people use their eyes, the ways people read, the capabilities of their eyes, and their brains to process information, vary widely, and in ways you cannot imagine unless you personally have run into problems and have noticed that you have them. In the simplest almost universal case people start experiencing eyestrain around age 40 requiring the use of compensating visual orthotics. Age 40 also seems to be about the age of greatest denial ;-)

On Sat, 17 Apr 2010, Jim Adcock wrote:
The way people use their eyes, the ways people read, the capabilities of their eyes, and their brains to process information, vary widely, and in ways you cannot imagine unless you personally have run into problems and have noticed that you have them. In the simplest almost universal case people start experiencing eyestrain around age 40 requiring the use of compensating visual orthotics. Age 40 also seems to be about the age of greatest denial ;-)
I could read the OED Microprint edition without decent lighting until 42. After that it was all downhill so fast I never really tried it any more-- with or without glasses, but would use the provided Bausch & Lomb reader. Today I use $1 glasses with all my computers. . .I just buy ever grade of magnification and leave each with the computer it works best with. I'll know I'm in trouble if/when I move to the 3x range. . .hee hee!

Yes, I agree with both. I never was very comfortable with OEDMP at any age, but could read it in good light at a pinch till about 50 (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) Now, as it happens, I am (primarily) an unfrocked biologist and have discovered that the strongest "readers" I can find, (+3.5 to 4 if I am lucky) though useless for proper reading (my current prescription is +2.5) make very useful visual aids for field work and are perfect for OEDMP reading; far better than the rather good magnifier supplied with the books. BTW, in case anyone else in the forum still reads and enjoys books, paper books (a medium that needs redesign, and I am just the man to do it!) might be interested in a useful expedient that I happened across. My OEDMP came in a box/shelf with magnifier and two slots, one for each tome. As the designers of the package obviously had experience of what happened to large volumes that got manhandled by their bindings, they had a neat expedient: behind each tome a strip of tough, transparent plastic was fastened to the upper back corner of the slot, and hung down to the bottom, passing thence to the front, where it emerged as a tab below each volume. To get the volume out without brutalising it, you simply pulled at the matching tab. The volume then emerged a few inches without damage or inconvenient scrabbling, and could then be picked up in a civilised, nondestructive mode. Now, after some 40 years or so, (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) those strips of polyester or whatever (I omitted to burn a bit, so I am uncertain; it might have been plasticised PVC or something (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) ) began to go nonfunctional and their connections failed. So I removed them. Then an idea struck as my gathering senility went on strike for a while. Some idiot was lining a dam with plastic in the near neighbourhood and offcuts of 2mm-thick black HDPE were lying around as though waste were a virtue. I had liberated a square metre or two and cut two strips to fit where the transparent plastic had gone. Unlike the original, my inserts were much stiffer and I applied some brutal folding to make it turn the corner, but had no need to fasten it at the top back corner. It works amazingly, smoothly and cleanly, and it is harmless to book, cabinet and reader. Two moving parts, including the book. Its only shortcoming for general use on broad shelves is that one needs strips that roughly correspond to the widths of the matching books. One could design shelves and attachments to overcome that (very minor) problem, but I seldom have such a need, so I let it go. Old age and all that. Cheers, Jon On 2010/04/18 02:11 AM, Michael S. Hart wrote:
On Sat, 17 Apr 2010, Jim Adcock wrote:
The way people use their eyes, the ways people read, the capabilities of their eyes, and their brains to process information, vary widely, and in ways you cannot imagine unless you personally have run into problems and have noticed that you have them. In the simplest almost universal case people start experiencing eyestrain around age 40 requiring the use of compensating visual orthotics. Age 40 also seems to be about the age of greatest denial ;-)
I could read the OED Microprint edition without decent lighting until 42.
After that it was all downhill so fast I never really tried it any more-- with or without glasses, but would use the provided Bausch& Lomb reader.
Today I use $1 glasses with all my computers. . .I just buy ever grade of magnification and leave each with the computer it works best with.
I'll know I'm in trouble if/when I move to the 3x range. . .hee hee!
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (6)
-
James Adcock
-
Jim Adcock
-
Jon Richfield
-
Lee Passey
-
Michael McDermott
-
Michael S. Hart