Re: [gutvol-d] Plucker server on gutenberg.org

How hard would it be to have two options on the download. One with just text and one with text and images? Josh ----- Original Message ----- From: "Marcello Perathoner" <marcello@perathoner.de> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Subject: Re: [gutvol-d] Plucker server on gutenberg.org Date: Mon, 07 Nov 2005 19:11:44 +0100
Matthew McClintock wrote:
Whoops, sorry - I meant "are there plans to have the plucker versions incorporate the images from the HTML documents?".
We easily could ... its just the flip of a switch on the plucker distiller.
The problem is, once we start including images, people with old hardware will complain about the size of the files. Also we don't have any records about which ebooks contain essential images and which only decorative images.
-- Marcello Perathoner webmaster@gutenberg.org
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Joshua Hutchinson wrote:
How hard would it be to have two options on the download. One with just text and one with text and images?
Not very hard on the technical side. Very hard on the administrative side. "Images" is not a clear cut boolean choice. 1. We don't know which ebooks contain essential illustrations. Some ill-advised PPers have started producing books with such useless fluff as drop caps (included as images) fancy horizontal rules (included as images) etc. Including those will only put off PDA users. 2. You can include pictures at different resolutions and color depths. We'll soon start producing files with 1bpp, 4bpp, 8bpp, 24bpp, original size images, scaled images etc. Every new variant will reduce cache hits. We don't have enough CPU power to generate all those variants. If you want images, you can run the plucker distiller on your PC and adjust all these parameters to your hearts content. -- Marcello Perathoner webmaster@gutenberg.org

Marcello Perathoner writes:
We don't know which ebooks contain essential illustrations.
Some ill-advised PPers have started producing books with such useless fluff as drop caps (included as images) fancy horizontal rules (included as images) etc. Including those will only put off PDA users.
Some people enjoy the reproduction of the decorative elements from the original book. If the decorative elements were in a class, could your plucker distiller leave them out?

Bruce Albrecht wrote:
Marcello Perathoner writes:
We don't know which ebooks contain essential illustrations.
Some ill-advised PPers have started producing books with such useless fluff as drop caps (included as images) fancy horizontal rules (included as images) etc. Including those will only put off PDA users.
Some people enjoy the reproduction of the decorative elements from the original book.
But the cost is too high. Especially the illuminated drop caps break the etext when viewed in many user-agents. This is how #7870 looks in lynx: --- THE PATERNOSTERS. A YACHTING STORY. A ND do you really mean that we are to cross by the steamer, Mr. Virtue, while you go over in the Seabird? I do not approve of that at all. ... --- It also breaks in plucker, and will drive anybody mad that uses the text as source for automatic processing, like text-to-speech etc.
If the decorative elements were in a class, could your plucker distiller leave them out?
It could leave the image out. But the text will still be broken because it uses presentational attributes (float: left) to make a drop cap. To automatically re-insert the letter into the paragraph it was explicitly floated out is beyond hope. I'll have to write a patch to the distiller, then convince the plucker developers that this is a useful feature, ... much work that would better be spent elsewhere. -- Marcello Perathoner webmaster@gutenberg.org

On Tuesday 08 November 2005 10:36 am, Marcello Perathoner wrote:
But the cost is too high. Especially the illuminated drop caps break the etext when viewed in many user-agents. This is how #7870 looks in lynx:
---
THE PATERNOSTERS.
A YACHTING STORY.
A
ND do you really mean that we are to cross by the steamer, Mr. Virtue, while you go over in the Seabird? I do not approve of that at all. ...
There's no reason a pre-processing script can't be used to detect an image with single-letter alt text before a paragraph and rewrite that bit before passing it to lynx. Granted, the decorative rules, etc. are less straightforward.

D Garcia wrote:
But the cost is too high. Especially the illuminated drop caps break the etext when viewed in many user-agents. This is how #7870 looks in lynx:
---
THE PATERNOSTERS.
A YACHTING STORY.
A
ND do you really mean that we are to cross by the steamer, Mr. Virtue, while you go over in the Seabird? I do not approve of that at all. ...
There's no reason a pre-processing script can't be used to detect an image with single-letter alt text before a paragraph and rewrite that bit before passing it to lynx.
You don't see the problem. All people who use a non-css user-agent (browser, screen-reader, text-to-speech processor, braille line, etc.) will have this problem. Should they all write a script before reading the book? PG has always laid great stress on posting correct html. This html is plainly broken and should not have been posted. -- Marcello Perathoner webmaster@gutenberg.org

On Sunday 13 November 2005 05:42 pm, Marcello Perathoner wrote:
You don't see the problem.
All people who use a non-css user-agent (browser, screen-reader, text-to-speech processor, braille line, etc.) will have this problem. Should they all write a script before reading the book?
Perhaps you don't see the solution. _PG_ can apply such a script to an ebook before processing it through say, the plucker distiller. Should PG say "oh, we can't/shouldn't have to do this" or should
PG has always laid great stress on posting correct html. This html is plainly broken and should not have been posted.
The HTML is correct. W3 says so. Lynx's (or other user-agents) ability to render it is what is broken and/or insufficient. You see that as an indictment of the data, when the problem is the tool. I'm very surprised to see an *ix person of long-standing apparently unable to see that this is a case where the long-standing *ix tradition of filtering data before passing it along could be readily applied to satisfy all parties requirements.

D Garcia wrote:
_PG_ can apply such a script to an ebook before processing it through say, the plucker distiller. Should PG say "oh, we can't/shouldn't have to do this" or should
"PG" could also write a script that applies a list of errata on-the-fly before the file leaves the server. But unsurprisingly PG prefers to fix the file instead.
The HTML is correct. W3 says so.
The HTML is not only bogus, but gratuitously so. It could have been made to work with any browser with very small effort of the brain. The following example works with IE60, IE55, IE50, Firefox, Opera, Konqueror, lynx, links and w3m, with both styles enabled or disabled in the browsers that let you choose. <html> <head> <style type="text/css"> span.dropcap { display: none; } span.dropcapa { float: left; height: 78px; width: 75px; margin: 0 1em 1em 0; background: url("http://www.gutenberg.org/files/7870/7870-h/images/b1.jpg") no-repeat top left; } </style> </head> <body> <h1>Chapter 1</h1> <p><span class="dropcapa"><span class="dropcap">A</span></span> merry party were sitting in the verandah of one of the largest and handsomest bungalows of Poonah. It belonged to Colonel Hastings, colonel of a native regiment stationed there, and at present, in virtue of seniority, commanding a brigade. Tiffin was on, and three or four officers and four ladies had taken their seats in the comfortable cane lounging chairs which form the invariable furniture of the verandah of a well-ordered bungalow. Permission had been duly asked, and granted by Mrs. Hastings, and the cheroots had just begun to draw, when Miss Hastings, a niece of the colonel, who had only arrived the previous week from England, said, commanding a brigade. Tiffin was on, and three or four officers and four ladies had taken their seats in the comfortable cane lounging chairs which form the invariable furniture of the verandah of a well-ordered bungalow. Permission had been duly asked, and granted by Mrs. Hastings, and the cheroots had just begun to draw, when Miss Hastings, a niece of the colonel, who had only arrived the previous week from England, said,</p> </body> </html> -- Marcello Perathoner webmaster@gutenberg.org

On 11/14/05, Marcello Perathoner <marcello@perathoner.de> wrote:
The HTML is not only bogus, but gratuitously so. It could have been made to work with any browser with very small effort of the brain.
Marcello, could you try to be a little politer? You seem to have a deep understanding of HTML and CSS, but that was hardly obvious to many of us who have a decent understanding of HTML, and many of our post-proofers have little understanding of HTML or computers; they just doing the best they can and working by rote at points. There's no need to insult them for trying to produce a good looking document.

David Starner wrote:
The HTML is not only bogus, but gratuitously so. It could have been made to work with any browser with very small effort of the brain.
Marcello, could you try to be a little politer? You seem to have a deep understanding of HTML and CSS, but that was hardly obvious to many of us who have a decent understanding of HTML, and many of our post-proofers have little understanding of HTML or computers; they just doing the best they can and working by rote at points. There's no need to insult them for trying to produce a good looking document.
Sorry. I got carried away by D Garcia's arguments, which are almost as good as You Know Who'se. To the PPers: I'm no CSS expert neither, but if you google for: css "drop cap" image you get to this site: http://www.stopdesign.com/articles/replace_text/ which explains the thing exactly. Next time some PPer wants to introduce a new feature I strongly advise him/her doing some research beforehand. There are many excellent CSS resources online. Also, a question asked on this list would have sufficed. -- Marcello Perathoner webmaster@gutenberg.org

On Tuesday 15 November 2005 08:30 am, Marcello Perathoner wrote:
Sorry. I got carried away by D Garcia's arguments, which are almost as good as You Know Who'se.
(Meanwhile, the rest of the list is having a rational discussion.) I tried to present an approach for you to address the problem in existing files, it's too bad you saw it as an argument instead of a discussion of possible solutions. See below for future file suggestions.
Next time some PPer wants to introduce a new feature I strongly advise him/her doing some research beforehand. There are many excellent CSS resources online.
I suspect few of them address this specific concern, and fewer volunteers would know of the concern in the first place, much less be able to choose what works best for PG from the various alternatives out there.
Also, a question asked on this list would have sufficed.
No one knew it was an issue until you brought it up, and when you did, you didn't offer a solution or even a recommendation beyond disparaging what you called "fluff." Now that you have _finally_ said "This is the problem we're experiencing, this is how I recommend that contributors avoid creating it." we can all cooperate to deliver a better experience for everyone, including making your life easier. Now that this is aired out, why not get the WW'ers involved and put this sort of information into the FAQ so that everyone benefits? And while we're at it, also discuss how to address other cases such as the decorative rules and finials and such that are causing Marcello such grief. David

You might also want to check out "manybooks" They provide many Project Gutenberg eBooks in various formats, as do a number of other sites. Michael On Tue, 15 Nov 2005, D Garcia wrote:
On Tuesday 15 November 2005 08:30 am, Marcello Perathoner wrote:
Sorry. I got carried away by D Garcia's arguments, which are almost as good as You Know Who'se.
(Meanwhile, the rest of the list is having a rational discussion.)
I tried to present an approach for you to address the problem in existing files, it's too bad you saw it as an argument instead of a discussion of possible solutions. See below for future file suggestions.
Next time some PPer wants to introduce a new feature I strongly advise him/her doing some research beforehand. There are many excellent CSS resources online.
I suspect few of them address this specific concern, and fewer volunteers would know of the concern in the first place, much less be able to choose what works best for PG from the various alternatives out there.
Also, a question asked on this list would have sufficed.
No one knew it was an issue until you brought it up, and when you did, you didn't offer a solution or even a recommendation beyond disparaging what you called "fluff." Now that you have _finally_ said "This is the problem we're experiencing, this is how I recommend that contributors avoid creating it." we can all cooperate to deliver a better experience for everyone, including making your life easier.
Now that this is aired out, why not get the WW'ers involved and put this sort of information into the FAQ so that everyone benefits? And while we're at it, also discuss how to address other cases such as the decorative rules and finials and such that are causing Marcello such grief.
David _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

Marcello Perathoner wrote:
1.
We don't know which ebooks contain essential illustrations.
Some ill-advised PPers have started producing books with such useless fluff as drop caps (included as images) fancy horizontal rules (included as images) etc. Including those will only put off PDA users.
Sometimes, the illuminated caps are the most beautiful part of the book.... It would help, though if we could tag importance to images. No need to keep every florette, but keeping some essential maps and illustrations would certainly helps. Having a mechanism in place to distinguish them may be nice...
participants (7)
-
Bruce Albrecht
-
D Garcia
-
David Starner
-
Jeroen Hellingman (Mailing List Account)
-
Joshua Hutchinson
-
Marcello Perathoner
-
Michael Hart