
I do not often post to this list. But the question is relevant. Unfortunately, I must state my preference for raw text files. And, to a great extent, I agree with Bowerbird. At least, those parts I can read. You need to know that I live in America. Born here. But I currently "surf" the "internet" with a PowerPC 6100/66. I use Mozilla ver. 3.0. Macintosh operating system 7.5.1. Dial-up is mostly 56K. Try downloading video with that set-up. In fact, try reading this user list with that set-up. I get SOME messages when I read directly. I get OTHER messages when I choose to read "raw source." And, when I go to my workplace, and read on their PC computers, I read more DIFFERENT messages. [Their internet connection is lots better. They download about 1/2 terabyte a day.] And yet, of the three different places that I can read this list, I NEVER get ALL the messages. Each and every one is different. Oh yes, there is overlap, but I'm not really sure if I have really gotten all the messages. Honestly! So, I think the original question was a two parter: 1) Why text only? 2) Why the hard line breaks? I must first apologize if I offend anyone by answering question 1). Text was considered universal to the English speaking world -- way back when Project Gutenberg started. This was at a time when Unicode would not exist for about two decades. I LOVE TEXT. As I just said above, I will not/can not/am not allowed to/ read all of your messages. Even if I go through three different setups, and two different servers, I am still not certain that I have read everything you have sent. I feel that I am being censored by the internet. It is truly my opinion that, if e-mail were just sent in TEXT, then I would know more of this world. Yes, a picture is worth a thousand words. No, I would rather read a thousand words than see a picture. Especially in this modern day, when everyone and their mother have a better way of showing data. Every country on the entire planet (that's what? 300+ countries?) they all have a new and better way to format text. Every different language must somehow show their data somehow JUST ABSOLUTELY CORRECT. Their standard is right. This standard is right. That standard is right. No, everything is wrong. Let's re-invent the wheel from scratch. No, it doesn't "look right." It has to be "correct." It is wrong if the text lines "break" at the "wrong" place. Errrm, got carried away there. In my opinion, I think the raw text of each book in Project Gutenberg, is the ultimate in how a book should be delivered. Again, I apologize if I have offended anyone on this list for writing my obvious opinion. 2) Why the hard line breaks? Partially, this was covered by (I think) Bowerbird. There was a time when there were no fonts. Specifically, there were no "variable-width" fonts. Way before the Macintosh existed, there was only one way to read text. And it was only one width per each character, and there were only 80 characters per line. Max. Period. And when Project Gutenberg was started, he set the standard at whatever existed at the time. Break it at never more than 80 characters -- and break it between words. No hyphenation. Now, this problem of hard line breaks is a legitimate problem. Now, several decades after Macintosh (and later, PC's); it is my considered opinion that there is no need for a hard line break. Even way-back-when, in the early days -- there was question of whether a hard line break was just a <LF> (line feed) or <CR><LF> (carriage return, followed by line feed). Yep, there were format problems back before 1980. Now-a-days, with all the wonderful formatting which is available; in so many different fonts; in so many different platforms; with so many different programs; that can read so many different styles; well then -- what do we choose is right? **sigh** Above, I have described my computer system. I will tell you, that my computer system is more advanced than perhaps 2/3rds of the world. Most do not have the bandwidth for a .pdf. Or actually any kind of formatted book. Maybe, they have an hour per week at an internet cafe. At 10-12K speed. They don't care if the line breaks are wrong. They care if they can read the books. And pretty much throughout that world, they can only read the books, ONLY if simple 8-bit ASCII text exists. No one on Project Gutenberg, NO ONE, can guarantee a more universal format, nor a faster format to download, than text only. (Except perhaps 7-bit ASCII, [capital letters only]; or OCTAL; but that diverges.] My only recommendation in this debate is this: There is no longer a need for a hard line break at every 80 character line. However, I believe there is still a need for a hard line break between paragraphs. I believe the text versions of the books can be scanned for single <CR> or <CR><LF> groups, and be removed. Double <CR><CR> or <CR><LF><CR><LF> should be maintained. And yes, I feel strongly this can be done to the ORIGINAL .txt files. It is my opinion that the technology of the world has truly advanced beyond the need for a hard line break at the end of every line. Paragraph breaks, yes. Line breaks, no. As to how this would translate into .pdf or Kindle? Gagg me with a spoon. I'm not there. Hope this helps, Jay Toser