
On Sat, Apr 17, 2010 at 12:15 PM, Jim Adcock <jimad@msn.com> wrote:
I have tried leaving project comments and the P1s and the P2s tend to read and follow the project comments whereas the P3s ignore them and undo the good work of the P1s and P2s.
And earlier Jim wrote:
I have stated repeatedly that I found found extremely competent and dedicated volunteers at all levels of DP -- and the converse.
Bizarre. In one post you're drawing back from blanket accusations and in the next, you repeat them. Jim, I don't understand WHY you feel impelled to keep throwing stones at DP. You don't like the way we do things, you've left ... it's all behind you, right? But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG. I suppose I ought to just killfile you, as I have Bowerbird. -- Karen Lofstrom

Jim, I don't understand WHY you feel impelled to keep throwing stones at DP. You don't like the way we do things, you've left ... it's all behind you, right? But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG.
If you read my comments carefully I think you will find that I try to speak truthfully to what works at DP and at PG and what doesn't work so that we all can try to fix it and make a better contribution to the world. In the business world this would be called "continuous improvement." PG'ers at least seem to be able to generally acknowledge what works and what doesn't work. In DP-land if you don't drink the koolaid and declare it tasty then you fall constantly under attack. If there are problems with how P3 works -- and there are -- one would think DP would want to face up to that and work to improve it -- just as in PG-land the lack of standards are causing texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs. I could say "gosh let's ignore this because DP and PG are all volunteers and their hearts are in the right places and I wouldn't want to hurt anyone's feelings" but that wouldn't change the facts: DP wastes a lot of volunteer time and in general makes things more painful than need be due to aged tools and approaches. And PG distributes a lot of stuff that ends up appearing "broken" to end users because of the standards chosen -- and/or the lack thereof.

On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock <jimad@msn.com> wrote:
In the business world this would be called "continuous improvement."
Jim, in the business world, your complaint about the fact the business wasn't working on your preferred projects would annoy the hell out of your coworkers the eighth time they heard it, just like here. -- Kie ekzistas vivo, ekzistas espero.

David Starner, if you only would be willing to take your own advice. So much of what you say here, and I've said it before, is complaint, without you providing any hope of solution. As I have said before-- there is a word for this, but it is not used in polite conversation. If only you took EITHER your own advice OR your signature block: "Kie ekzistas vivo, ekzistas espero." at all seriously, then we would be glad to hear from you, however it turns out that all too much of what you say goes to /dev/null or the various other killfiles people use to filter you out. Now. . .please. . .give some hope. . .or you will most certainly see the result of using vinegar rather than honey to get what you want-- presuming you really do want things to get/work better. Please. . .take a lesson from you own words. . . . You once said something like: As an honest person I am willing to learn from my mistakes. . . . Please do. . . . On Sun, 18 Apr 2010, David Starner wrote:
On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock <jimad@msn.com> wrote:
In the business world this would be called "continuous improvement."
Jim, in the business world, your complaint about the fact the business wasn't working on your preferred projects would annoy the hell out of your coworkers the eighth time they heard it, just like here.
-- Kie ekzistas vivo, ekzistas espero. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Jim, in the business world, your complaint about the fact the business wasn't working on your preferred projects would annoy the hell out of your coworkers the eighth time they heard it, just like here.
That is probably a true statement: When one talks about things being "broken" and open for possible improvement the response is almost always universally scorn and derision. Only when an organization falls into acute duress is it usually open to considering change -- if then. The US auto industry being perhaps a current, but weak, example. Stating that I have a dim view of P3ers is probably overstating the case. What I am sure I have a dim view of is: Query-hyphen and especially the rote overuse of it by some P3ers. The rote removal of whitespace on both sides of m-dash even when that is clearly not author intent. Some P3ers who are clearly just SR'ing without looking at the page images. Punting "bugs" down field under the assumption that *someone else* is going to fix them. Not having a clear point in the process when the "proofing" phase is supposedly done. Taking 3+ years to create a text, or not finishing a text that has had considerable volunteer time and effort invested in it. Designing a process where *no one* is allowed to take responsibility for a text. Distributing texts that have less than 1 or more than 1 copy of some portion of an author's text. Distributing "risen to the public domain" texts under DRM Preventing friends and fellow citizens from sharing texts "risen to the public domain" Otherwise claiming or enforcing restrictions on the sharing and redistribution of texts "risen to the public domain" Creating texts that cannot be used as widely as possible on a great variety of differing reader machines including addressing issues of "accessibility" Demoware -- Sorry if any of these statements are controversial -- I don't think they should be!

Karen Lofstrom wrote:
But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG.
Quantity, yes ... Let's talk *quality* instead. The problem is not that some PPers are incompetent, the problem is that the whole DP output is technically obsolete: DP is producing `HTML Facsimiles for the Desktop´ while it should be producing eBooks. Which do you think is more useful? A book you can only read at home on your dektop or a book you can read everywhere on your phone? Ironically much of PPing clogs the queues while lessening the value of the books. DP output renders ugly on all devices except desktop-sized screens. DP HTML is almost as hard to convert to other formats as PG plain text. DP has to enforce some standard that greatly simplifies the output. -- Marcello Perathoner webmaster@gutenberg.org

Hear! Hear! On Sun, 18 Apr 2010, Marcello Perathoner wrote:
Karen Lofstrom wrote:
But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG.
Quantity, yes ... Let's talk *quality* instead.
The problem is not that some PPers are incompetent, the problem is that the whole DP output is technically obsolete:
DP is producing `HTML Facsimiles for the Desktop´ while it should be producing eBooks.
Which do you think is more useful? A book you can only read at home on your dektop or a book you can read everywhere on your phone?
Ironically much of PPing clogs the queues while lessening the value of the books.
DP output renders ugly on all devices except desktop-sized screens.
DP HTML is almost as hard to convert to other formats as PG plain text.
DP has to enforce some standard that greatly simplifies the output.

"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
Marcello> Karen Lofstrom wrote: >> But no, you have to join the grouch group here at PG and >> repeatedly attack the organization that is providing the >> overwhelming majority of the texts submitted to PG. Marcello> Quantity, yes ... Let's talk *quality* instead. Marcello> The problem is not that some PPers are incompetent, the Marcello> problem is that the whole DP output is technically Marcello> obsolete: Marcello> DP is producing `HTML Facsimiles for the Desktop´ while Marcello> it should be producing eBooks. Marcello> Which do you think is more useful? A book you can only Marcello> read at home on your dektop or a book you can read Marcello> everywhere on your phone? Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging. Carlo

It really doesn't matter what DP targets as long as it's capable of identifying, completely and unambiguously, the requisite syntactic elements. But we have no agreed list, not even an ad hoc functional one, of what those are. Instead our focus is on subjective elegance of appearance rather than on objective clarity and completeness. "Good work" has come to be associated with "looks pretty and makes the PPer feel good," plus the ability to pass two sets of incompletely documented and sometimes inconsistent automated tests - the postprocessor tools and the whitewashers' tools - neither of which were intended to consider syntactic rigor and accuracy. Interestingly, we seem to have instinctively inferred the need of this. the HTML texts often include some basic form of it (or more accurately an ad hoc collection of basic forms) in the CSS stylesheets. It seems to me that we need the "what" before we worry about the "how". Don On Sun, Apr 18, 2010 at 8:05 AM, Carlo Traverso <traverso@posso.dm.unipi.it>wrote:
"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
Marcello> Karen Lofstrom wrote:
But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG.
Marcello> Quantity, yes ... Let's talk *quality* instead.
Marcello> The problem is not that some PPers are incompetent, the Marcello> problem is that the whole DP output is technically Marcello> obsolete:
Marcello> DP is producing `HTML Facsimiles for the Desktop´ while Marcello> it should be producing eBooks.
Marcello> Which do you think is more useful? A book you can only Marcello> read at home on your dektop or a book you can read Marcello> everywhere on your phone?
Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging.
Carlo _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Carlo Traverso wrote:
"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
Marcello> Karen Lofstrom wrote:
>> But no, you have to join the grouch group here at PG and >> repeatedly attack the organization that is providing the >> overwhelming majority of the texts submitted to PG.
Marcello> Quantity, yes ... Let's talk *quality* instead.
Marcello> The problem is not that some PPers are incompetent, the Marcello> problem is that the whole DP output is technically Marcello> obsolete:
Marcello> DP is producing `HTML Facsimiles for the Desktop´ while Marcello> it should be producing eBooks.
Marcello> Which do you think is more useful? A book you can only Marcello> read at home on your dektop or a book you can read Marcello> everywhere on your phone?
Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging.
That is not the problem. You can botch ePub as easily as you can HTML. (In fact ePub is only HTML + some metadata) You should produce HTML that is *semantically* correct and degrades gracefully. Ie. if you remove all CSS it should still make sense. Most prominent offenders are non-semantic headers, preformatted text, positioning, floating and ornaments. -- Marcello Perathoner webmaster@gutenberg.org

On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso wrote:
Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging.
Carlo
From everything I've seen about ePub, adding static ePub files to
I don't think we're ready for this except in rare cases where ePub is the best format for display for a particular item (we just released a book where PDF was the best format, believe it or not). The challenge is that when books are fixed, someone (typically the whitewasher, seldom the original submitter) needs to regenerate all the files from that book. Since there is not yet any standard processing stream to generate static ePub files, this makes it hard for fixes (to HTML & text) to be applied to ePubs. I would, of course, love to see something become our "standard" conversion tool, usable by anyone. Right now, the closest for PG is Marcello's software to build the cached ePub files. It's wonderful and functional, but is it ready for all envisioned purposes? I think not, due at least in part to shortcomings of the input HTML. ALL that said, maybe I am too hung up on automated or semi-automated methods. It *is* the case that an ePub can yield plain HTML, which could be edited and zipped up into a new ePub (without too much trouble). Is there enough benefit in such ePubs? Are there good examples of hand-crafted (or automated, but using different software than is used on the gutenberg.org server) that are far superior to the alternatives? Having a single master format, from which all subsidiary formats can be derived, has been a long-time goal. This has not yet been viable for most titles, despite valiant (and productive) efforts with HTML and TeX. the collection would be a net increase in the effort needed to apply fixes (i.e., it would be one MORE format to deal with by hand, not a generated format that would be very little extra work to generate). There are lots of people involved in creating, managing and fixing eBook files, and there is certainly room for any experiments that people can think of. My response isn't intended to quell such effort, rather to state that given the current state of things, I don't think ePub is a great candidate for a new static file format for the PG collection. -- Greg

"Greg" == Greg Newby <gbnewby@pglaf.org> writes:
Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso Greg> wrote: >> Is PG ready to accept Epub as submission format? (i.e. one >> submits a valid epub from which the other formats are derived)? >> If so, one can target Epub, otherwise at best one is forced to >> submit HTML or txt that converts not-too-badly with current PG >> tools, and this migh be extremely challenging. >> >> Carlo Greg> I don't think we're ready for this except in rare cases Greg> where ePub is the best format for display for a particular Greg> item (we just released a book where PDF was the best format, Greg> believe it or not). Greg> The challenge is that when books are fixed, someone Greg> (typically the whitewasher, seldom the original submitter) Greg> needs to regenerate all the files from that book. Greg> Since there is not yet any standard processing stream to Greg> generate static ePub files, this makes it hard for fixes (to Greg> HTML & text) to be applied to ePubs. Greg> I would, of course, love to see something become our Greg> "standard" conversion tool, usable by anyone. Right now, Greg> the closest for PG is Marcello's software to build the Greg> cached ePub files. It's wonderful and functional, but is it Greg> ready for all envisioned purposes? I think not, due at Greg> least in part to shortcomings of the input HTML. That's the whole point of my proposal. Starting with hand-crafted HTML we are likely to end with poor ePub, since the inference of metadata might be wrong, and many features of HTML need to be tuned to ePub and might not turn out correct; While obtaining reasonable HTML from ePub is just unzipping and discarding metadata. Maybe it will be harder to have "nicely handcrafted" HTML, but we have to give the best available product in the standard format that most users are likely to use (and of course a reasonable product in every other format). To maintain ePub (to correct typos) one has to unzip the ePub, correct the HTML and re-zip. Another issue is to automate the creation of txt from HTML. Currently, the output of w3m -dump (or links -dump, or lynx -dump etc.) is pretty good for txt, except that font changes (mainly, underscores for italics) are lost. It shouldn't be difficult to pre-process the HTML to show the underscores for italics, in such a way that one obtains a reasonable PG txt file. This might work better from the HTML generated from epub (in which the HTML is more constrained) than for handcrafted HTML. It might be a bit more challenging to downgrade from UTF-8 (as generated by -dump) to iso-8859-1 or to ASCII, for example to handle the unicode characters that are used to draw tables, but this might be very well automated too. This is on my side an offer to work towards the production of a toolchain along these lines, if it is not discarded a priori. Carlo

Carlo Traverso wrote:
"Greg" == Greg Newby <gbnewby@pglaf.org> writes:
Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso Greg> wrote: >> Is PG ready to accept Epub as submission format? (i.e. one >> submits a valid epub from which the other formats are derived)? >> If so, one can target Epub, otherwise at best one is forced to >> submit HTML or txt that converts not-too-badly with current PG >> tools, and this migh be extremely challenging. >> >> Carlo
Greg> I don't think we're ready for this except in rare cases Greg> where ePub is the best format for display for a particular Greg> item (we just released a book where PDF was the best format, Greg> believe it or not).
Greg> The challenge is that when books are fixed, someone Greg> (typically the whitewasher, seldom the original submitter) Greg> needs to regenerate all the files from that book.
Greg> Since there is not yet any standard processing stream to Greg> generate static ePub files, this makes it hard for fixes (to Greg> HTML & text) to be applied to ePubs.
Greg> I would, of course, love to see something become our Greg> "standard" conversion tool, usable by anyone. Right now, Greg> the closest for PG is Marcello's software to build the Greg> cached ePub files. It's wonderful and functional, but is it Greg> ready for all envisioned purposes? I think not, due at Greg> least in part to shortcomings of the input HTML.
That's the whole point of my proposal. Starting with hand-crafted HTML we are likely to end with poor ePub, since the inference of metadata might be wrong, and many features of HTML need to be tuned to ePub and might not turn out correct;
And what about users who download the HTML to view on a mobile? You must produce better HTML not for the sake of ePub but for the sake of universal usability. The metadata come directly from the PG database and are updated whenever the PG database changes. That makes our metadata far more consistent than your proposal would do.
While obtaining reasonable HTML from ePub is just unzipping and discarding metadata.
ePub HTML is often split into chapters, which may leave you with 50+ files after unzipping which you have to merge manually.
This is on my side an offer to work towards the production of a toolchain along these lines, if it is not discarded a priori.
Before that can happen a major `paradigm shift´ has to happen at DP. At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM. What the PPers should do is to produce light semantic markup that lets the user choose the presentation and device: "Get It The Way You Want." The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted. -- Marcello Perathoner webmaster@gutenberg.org

Worthy of a second look: Marcello Perathoner said: [re: eBooks for cellphones, etc] Before that can happen a major `paradigm shift´ has to happen at DP. At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM. What the PPers should do is to produce light semantic markup that lets the user choose the presentation and device: "Get It The Way You Want." The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted.

In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice. I took another look at the PG HTML FAQ and it does not say anything that might be used as a guide to improving HTML output. It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats. I (for one) am happy to modify the way I do things -- as long as someone explains what should/shouldn't be done and why. I am not a computer professional (and neither are many or most of the PPers at DP) and don't have the time or background to track down the current thinking on how to code HTML. But I don't have a problem modifying my practices to end up with a better end product. Perhaps some of the time that is spent ranting about DP's work flow and DP's output could be better put to use creating more informative FAQs or even guidelines that DPers can use to create output that fits into the current thinking about acceptable HTML and/or other formats. On 4/19/2010 8:57 AM, Michael S. Hart wrote:
Worthy of a second look:
Marcello Perathoner said: [re: eBooks for cellphones, etc]
Before that can happen a major `paradigm shift´ has to happen at DP.
At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM.
What the PPers should do is to produce light semantic markup that lets the user choose the presentation and device: "Get It The Way You Want."
The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 4/19/2010 9:47 AM, Julia C. Miller wrote:
In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice. I took another look at the PG HTML FAQ and it does not say anything that might be used as a guide to improving HTML output.
The odds of this happening are about equivalent to that of having porcine aviators; Mr. Hart is diametrically opposed to standards of any kind for PG. However, PG creating an HTML standard is in fact unnecessary. According to Mr. Hart (although somewhat disputed by Mr. Haines) PG will accept just about anything it is given. Thus, DP could establish its own HTML guidelines with the assurance that they would be acceptable to PG. Non-conforming HTML could still make its way into the PG corpus from other sources, but at least the DP work-product would be consistent.
It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats.
This could be so difficult as to be nigh on impossible. For example, as most here know, the ".epub" format is actually just a zip file containing (among other things) the XHTML version of the document. How that document is displayed does not rely at all on the nature of the document's markup, but almost exclusively on the capabilities of reading device's software. The .epub readers based on JavaScript (such as Monocle) will probably display the text with as much richness as the hosting browser software would, whereas standalone .epub readers (such as µBook) will only display what the software designers felt was important, and probably will not support CSS at all. No one viewer can tell you if the markup is satisfactory, because with .epub the markup is only part of the story. On the other hand, if DP were to establish HTML guidelines and requirements (requirements for a baseline, guidelines for enhancements) I would be happy to code up a program which would test for conformance to those guidelines. I couldn't give you a picture, but I could give you a thousand words.
I (for one) am happy to modify the way I do things -- as long as someone explains what should/shouldn't be done and why. I am not a computer professional (and neither are many or most of the PPers at DP) and don't have the time or background to track down the current thinking on how to code HTML. But I don't have a problem modifying my practices to end up with a better end product.
Adding HTML markup to a document (or modifying that which is already there) is nowhere near as difficult as many would have you believe. Check out http://web.archive.org/web/20080327044926/gutenberg.hwg.org/tutorials.html and http://www.dysfunctionals.org/~networker/HTMLeBooks.html. But you are correct, having a document like one of these which is DP-sanctioned would simplify a PPers life dramatically.
Perhaps some of the time that is spent ranting about DP's work flow and DP's output could be better put to use creating more informative FAQs or even guidelines that DPers can use to create output that fits into the current thinking about acceptable HTML and/or other formats.
Many have tried (among them Mr. Hutchinson and Mr. Perathoner). But without organizational buy-in those FAQs and guidelines will go nowhere--fast. Unfortunately, there appears to be no one left at DP with the clout to say, "this is our first draft of HTML guidelines. Comments and discussion is welcome, but by the end of the year some sort of guidelines /will/ be adopted." As near as I can tell, the ranters rant not because DP's work flows are, shall we say, sub-optimal, or because the FAQs and guidelines have not been written, but because none of the Powers That Be at DP seem to be willing to do anything about it. These kinds of decisions cannot be made by consensus. Somebody needs to step up to the plate. Mr. Adcock seems to still have enough respect for DP that he believes it can be improved. I do not. I would love for someone to prove me wrong.

Also, I see maybe 3 or 4 elements that should be identified in-line using conventions we already have. Italics, boldface, small-caps (although these are often micro-headings), ... One opportunity I think would be break out embedded quotes and make them visually obvious. And their boundaries checkable.

It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats.
This could be so difficult as to be nigh on impossible. For example, as most here know, the ".epub" format is actually just a zip file containing (among other things) the XHTML version of the document....
Sorry, but I've looked and tried to port Marcello's HTML->epub code and its anything but that simple. (But I am not an experienced Python coder) Again, to my mind a "preview" need simply be a portable version of Marcello's code so that we can do our own HTML to ePub conversion (and from there to MOBI) and run it on the variety of ePub and MOBI reader devices and software we already own, so that we have *some* idea of the problems that the particular HTML is going to run into on various portable devices. And I am sure there are any number of people who are willing to preview a DP candidate release on the hardware they own in order to find what problems there are to be found -- most of us are pretty passionate about our choice of hardware and would like very much for DP/PG to produce ebooks that actually work on our hardware investments! PS: I already to make preview versions of my HTML on ePub and MOBI -- its just that the HTML->ePub and HTML->MOBI conversion software I have is not identical to Marcello's and thus the formatting ends up different than the "official" version.

On 4/19/2010 11:26 AM, Jim Adcock wrote: [snip]
PS: I already to make preview versions of my HTML on ePub and MOBI -- its just that the HTML->ePub and HTML->MOBI conversion software I have is not identical to Marcello's and thus the formatting ends up different than the "official" version.
If true, this is troubling. Because .epub is just a ZIP file, you should be able to open the archive in your favorite tool (WinZip, WinRar, 7-Zip, PowerArchiver, whatever) or use gzip -x and extract all the files. The HTML file(s) should be identical to whatever the source was. If they differ, the differences had better be harmless (making the source valid XHTML, for example). If they /do/ differ in substantive ways, Marcello should revisit his "publishing" code. It is possible, however, that if an .epub file looks different when rendered than the source HTML perhaps the archive contains a default stylesheet that alters the appearance. BTW, to create a valid .epub file, start by creating an .opf file which describes the publication. One extracted from an existing .epub file should give you a good example of what is necessary. Then create a container.xml file that references the .opf file you created. Put this file in a subdirectory called "meta-inf". Lastly capture the mimetype file from an existing .epub. Now, add "mimetype" to a zip file, *without compression*. Then add the .opf file, the content XHTML file(s), and meta-inf/container.xml. Rename the file to ".epub", and voilà, you have a valid .epub file. Of course other files can be added as well (such as font files and stylesheets), but they are just gilding the lily. The actual paths of the various files are irrelevant except for the container.xml file, which *must* be in the meta-inf/ folder (and of course the paths to the files must be correctly recorded in the .opf file). I think it is only polite to add the .opf file to the archive second, and to leave it uncompressed, but that is fairly uncommon. The OCF specification requires that the mimetype file be the first file in the archive (so it can always be found at a specific byte offset), but I know of no .epub reader that actually enforces this requirement.

Julia C. Miller wrote:
In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice.
It would be much better if DP did that.
It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats.
Roger Frank has the converter and did extensive testing on it.
I (for one) am happy to modify the way I do things -- as long as someone explains what should/shouldn't be done and why. I am not a computer professional (and neither are many or most of the PPers at DP) and don't have the time or background to track down the current thinking on how to code HTML. But I don't have a problem modifying my practices to end up with a better end product.
Got to the DP wiki and search for 'ePub'. I don't know the exact url because the site is down. -- Marcello Perathoner webmaster@gutenberg.org

On 4/19/2010 12:35 PM, Marcello Perathoner wrote:
Julia C. Miller wrote:
In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice.
It would be much better if DP did that.
So after DP goes through the time and effort to define the standards to upload to PG, people from PG can say "No, that's not what we want"?
It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats.
Roger Frank has the converter and did extensive testing on it.
Yes, Roger has the converter and his discussion of the changes that need to be made so the conversion to ePub works properly was very helpful. I used what I learned in that thread in the last 8 books that I have uploaded. But I am working on books right now that I know will not convert properly (based on what I have learned from Roger's discussion). I would like to be able to preview, change the coding and preview again until I find a satisfactory solution.

Julia C. Miller wrote:
So after DP goes through the time and effort to define the standards to upload to PG, people from PG can say "No, that's not what we want"?
I don't see any danger of that as long as the new standards are more restrictive than the old ones. And they'd have to be a lot more restrictive to be worth the trouble of implementing them.
Yes, Roger has the converter and his discussion of the changes that need to be made so the conversion to ePub works properly was very helpful. I used what I learned in that thread in the last 8 books that I have uploaded. But I am working on books right now that I know will not convert properly (based on what I have learned from Roger's discussion). I would like to be able to preview, change the coding and preview again until I find a satisfactory solution.
The sources are online ... But me being a 100% linux shop and ibiblio being a 100% linux shop and with 99% of you wanting a windows software somebody has to take the time and port it. OTOH the converter is just one link in the chain. You'd also have to test the ePub on every reader out there. Its much easier to forget about fancy formatting and use only the simplest HTML constructs. -- Marcello Perathoner webmaster@gutenberg.org

Roger is also a 100% linux shop. Well, that may not be entirely true - he probably uses Solarix and various unices. On Tue, Apr 20, 2010 at 8:26 AM, Marcello Perathoner <marcello@perathoner.de
wrote:
Julia C. Miller wrote:
So after DP goes through the time and effort to define the standards to
upload to PG, people from PG can say "No, that's not what we want"?
I don't see any danger of that as long as the new standards are more restrictive than the old ones. And they'd have to be a lot more restrictive to be worth the trouble of implementing them.
Yes, Roger has the converter and his discussion of the changes that need
to be made so the conversion to ePub works properly was very helpful. I used what I learned in that thread in the last 8 books that I have uploaded. But I am working on books right now that I know will not convert properly (based on what I have learned from Roger's discussion). I would like to be able to preview, change the coding and preview again until I find a satisfactory solution.
The sources are online ... But me being a 100% linux shop and ibiblio being a 100% linux shop and with 99% of you wanting a windows software somebody has to take the time and port it.
OTOH the converter is just one link in the chain. You'd also have to test the ePub on every reader out there.
Its much easier to forget about fancy formatting and use only the simplest HTML constructs.
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Its much easier to forget about fancy formatting and use only the simplest HTML constructs.
I think for the most part people are after reviewing the recent submissions. It seems like there a only a few commonly repeated mistakes PPs do that confound ePub and MOBI generation: What do to about illustrations in the ePub and MOBI files distributed without illustration. Illuminated Initial Caps Drop Caps Text represented as Illustration for some reason (PP thought the original text looked so cool that some of it was introduced as an Illustration) Equations "typeset" in Unicode/HTML I wonder if instead of enumerating the HTML constructs people are allowed to use if it wouldn't be better simply to enumerate the HTML practices that will lead to trouble? Again, I don't think people are trying to cause trouble, they just get seduced by some visual aspect of HTML without realizing the problems that will cause later.

As I said before, no one can or will complain vociferously if BOTH the illuminated caps AND the ASCII are included. It won't hurt the readability, and it won't matter where the illumination ends up in nearly such exact terms. Why make this so much harder than is has to be???!!! Just make it so everyone can BOTH read the text AND appreciate the illumination. So. . .please. . .stop wasting time and effort, and just make it easy on all concerned, as it should be. No more mountains made out of molehills. . . . Thanks!!! Give eBooks in 2010!!! Michael S. Hart Founder Project Gutenberg Inventor of eBooks Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Tollbooth, by Norton Juster: Lesson of Life. . . If you ever do not get a prompt response, please resend, then keep resending, I won't mind getting several copies per week. On Tue, 20 Apr 2010, James Adcock wrote:
Its much easier to forget about fancy formatting and use only the simplest HTML constructs.
I think for the most part people are after reviewing the recent submissions. It seems like there a only a few commonly repeated mistakes PPs do that confound ePub and MOBI generation:
What do to about illustrations in the ePub and MOBI files distributed without illustration.
Illuminated Initial Caps
Drop Caps
Text represented as Illustration for some reason (PP thought the original text looked so cool that some of it was introduced as an Illustration)
Equations "typeset" in Unicode/HTML
I wonder if instead of enumerating the HTML constructs people are allowed to use if it wouldn't be better simply to enumerate the HTML practices that will lead to trouble? Again, I don't think people are trying to cause trouble, they just get seduced by some visual aspect of HTML without realizing the problems that will cause later.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 4/20/2010 8:55 AM, Julia C. Miller wrote:
On 4/19/2010 12:35 PM, Marcello Perathoner wrote:
Julia C. Miller wrote:
In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice.
It would be much better if DP did that.
So after DP goes through the time and effort to define the standards to upload to PG, people from PG can say "No, that's not what we want"?
Sure. They can do that now with any of DP's offerings. But they won't. With the exception of Mr. Perathoner, I would be surprised if there were any of the Powers That Be at PG who know enough about HTML to be able to determine if an HTML file were "good" or "bad." And there are plenty of "bad" HTML files in the PG archive already. If DP were to develop standards for HTML files, they would become the /de facto/ HTML standard for PG, although no one but DP would actually enforce them. If you can help convince DP to establish HTML guidelines and standards, I think you ought to try, if for no other reason than to produce guidelines that can be used independently of DP. DP is moribund, but not nearly as moribund as PG.

The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted.
I'm not sure that the PPers in question understand the damage they are doing. A first step would be not to force changes but at least let people know what problems they are creating and how NOT to cause them. There are some people at DP who care about these issues -- and obviously others who do not. Obviously its very hard to tell people to try to minimize their use of CSS....

On Mon, Apr 19, 2010 at 5:15 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM.
You know we might have TEI-Lite now if you hadn't tried to push your pet preferences about what the generated HTML must look like on all DP projects, especially when you had the audacity to call it standard when it clearly wasn't. -- Kie ekzistas vivo, ekzistas espero.

David Starner <prosfilaes@gmail.com> writes:
You know we might have TEI-Lite now if you hadn't tried to push your pet preferences about what the generated HTML must look like on all DP projects, especially when you had the audacity to call it standard when it clearly wasn't.
At least, tidy seems to be happy with it and you can embed your own CSS fragments. And, finally, it's much better than all these handcrafted HTML exercises that are mostly just a waste of time. -- Karl Eichwalder

Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt
that converts not-too-badly with current PG tools, and this migh be extremely challenging. It would be nice to have a portable version of the current tools, so that transcribers can see how their HTML is going to "officially" translate into ePub and MOBI prior to submission. I tried porting the tools, but got bogged down by the amount of stuff which wouldn't port easily.

On 4/19/2010 9:00 AM, Jim Adcock wrote: [snip]
It would be nice to have a portable version of the current tools, so that transcribers can see how their HTML is going to "officially" translate into ePub and MOBI prior to submission. I tried porting the tools, but got bogged down by the amount of stuff which wouldn't port easily.
Only half of this proposal is possible: the .mobi half. As others have pointed out recently, .epub is not really an e-book format. For reasons both technical and practical, most people agree that HTML is the preferred markup for creating e-books. The primary drawback to HTML is that it is inherently a multi-file solution; the HTML file is distinct from the image files, CSS files, font files, etc. Moreover, if you had multiple HTML files that made up the book (and sometimes there are good technical reasons for doing so) you needed yet another metafile that described how the different files related to each other. After about a year of wrangling, in September 2006 the IDPF officially released the "Open Container Format," which specified how a collection of HTML files and the other files on which they depend would be included in a ZIP archive. The specification recommends using the file extension ".epub" to identify files that are OCF containers. In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion." Mobipocket, on the other hand, is a different ball of worms. The original Mobipocket reader (which, I understand, became the basis for the Kindle software) used a subset of HTML markup, and in a few instances changed the meaning of tags (<hr /> does not create a Horizontal Rule, but starts a new page in the user agent). It did not recognize all of the named entities, and did not support CSS at all. A Mobipocket PRC file was simply this almost-HTML compressed using Rick Bram's PalmDOC compression scheme (which was actually quite elegant in its simplicity). The later ".mobi" format was the same almost-HTML file compressed across the entire package using Huffman encoding instead. It produces a somewhat small file; the contents of the archive are identical to those in the ".prc" format. Mobipocket Publisher (which I assume is still what is used to create Kindle files) claimed that Mobipocket files supported CSS. In fact what happened was that Mobipocket Publisher would load a CSS file if it were specified in the source HTML, and would convert all the style attributes and computed CSS to the almost-HTML the Mobipocket reader recognized. Thus, a style like "style='font-size: larger';" might be converted to "<font size='4'>", but a style like "style='margin-left: 10em';" was simply discarded, because the Mobipocket almost-HTML did not recognize any way to change margin sizes. If you wanted to test the Mobipocket conversion, I would think the way to do that would be to extract the modified HTML from the Mobipocket file, and then write whatever kind of tests you needed to be sure the conversion was correct. I have some 'C' code hanging around to extract HTML from ".mobi" files; if you want it, I could send it to you.

The primary drawback to HTML is that it is inherently a multi-file solution; I'd say that's far from the primary drawback. Much more substantial drawbacks are that is presentational, not syntactic; and even if you make it even more complex with syntactic information (or don't for that matter) the proofers will never (nor should they) proof in that format. For DP's purposes, for actually doing the work, HTML is a non-starter - but so is any other equally complex (I'd say any XML-based) representation. What we have in there already (<i> etc.) is the locus of major headaches and an ongoing error-trap.

In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion."
You make an interesting thesis, which, rare in the case of DP/PG arguments, is eminently testable. I have done so, and you clearly have not. Take a PG HTML zip file, say "76" for the sake of completeness. Download it, and unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for concreteness. Download it, and unpack it on your computer. Now, look at the contents. Do they have the same HTML files? No they do not. Do the have the same number of HTML files? No they do not. Are the contents of the HTML files identical? No they are not. For the sake of completeness, open the first HTML file of each. Do the files RENDER the same on your browser when you actually TRY them to see if your thesis is correct? No they do not RENDER the same. It is an interesting thesis that PG epub files are "just" a zipped version of the PG HTML files -- but it is an easily demonstrably false thesis. Marcello's epub software does more than "just" pack the HTML files into an epub package. Ask him for a copy of his converter software, and see what the conversion actually entails. And/or ask Marcello what conversions he actually does to move from the HTML version to the epub version. Thus again, I suggest that it would be a good idea to have a portable version of Marcello's epub conversion software that we could use for testing on our local machines. Given a portable version of the epub conversion software going to mobi is easy using the same Amazon/Mobipocket provided epub->mobi conversion software that Marcello is already using.

If only Jim would have been as thorough, and polite, about the iPad. mh On Mon, 19 Apr 2010, Jim Adcock wrote:
In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion."
You make an interesting thesis, which, rare in the case of DP/PG arguments, is eminently testable. I have done so, and you clearly have not. Take a PG HTML zip file, say "76" for the sake of completeness. Download it, and unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for concreteness. Download it, and unpack it on your computer.
Now, look at the contents.
Do they have the same HTML files?
No they do not.
Do the have the same number of HTML files?
No they do not.
Are the contents of the HTML files identical?
No they are not.
For the sake of completeness, open the first HTML file of each. Do the files RENDER the same on your browser when you actually TRY them to see if your thesis is correct?
No they do not RENDER the same.
It is an interesting thesis that PG epub files are "just" a zipped version of the PG HTML files -- but it is an easily demonstrably false thesis. Marcello's epub software does more than "just" pack the HTML files into an epub package. Ask him for a copy of his converter software, and see what the conversion actually entails. And/or ask Marcello what conversions he actually does to move from the HTML version to the epub version.
Thus again, I suggest that it would be a good idea to have a portable version of Marcello's epub conversion software that we could use for testing on our local machines. Given a portable version of the epub conversion software going to mobi is easy using the same Amazon/Mobipocket provided epub->mobi conversion software that Marcello is already using.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

If only Jim would have been as thorough, and polite, about the iPad.
I don't know what your problem is, but per your suggestions I went back to the Apple "brick and mortar" store yesterday, and spent an additional three+ hours researching all the iPad suggestions you made. None of them "worked" as you suggested. None of them allow direct access to allow one to read ebook files either ePub or MOBI via wifi access to the internet. None of them allow latest access to the most recent books in ePub format released on PG. What almost all of them do is allow access to an internet server tied to that particular applet that allows one to read some subset of the PG offerings in some degraded form, typically a slightly spruced up "pretty print" version of a text file. This can simply be checked by searching for one of the latest PG books in which case you will find that NONE of these applets offer the latest PG books. Unlike most full function web browsers on desktops or even netbooks, if one uses the provided Safari web browser to click on an ePub or MOBI file Safari says "You cannot download that file." You CAN use Safari to open a PDF file, which makes iPad useful for reading Google Books PDF "photocopies" of books. Just not useful for what PG offers. Also even IF you use Safari to open a PDF file, apparently Safari does not retain a copy of that book because when I run Safari again it redownloads that PDF file again from scratch. Why does one care? Well, that's slow, and it means that this doesn't work in "airplane mode" ie you cannot read the PDF file on an airplane via Safari with the wifi turned off. The ability to actually download and save book files is a fundamental feature of "real" book readers, and in the design of "real" ebook formats such as ePub and MOBI, such that when you download a book, you HAVE that book, and can then read it wherever and whenever you choose without requiring wifi or other wired connection. What the Apple manuals say (finally released yesterday and read last night) is that one is allowed to transfer free book files in ePub format via USB cable from your desktop to your iPad via iTunes. You download a free ePub book to your desktop. From there you transfer it to the iTunes software. Then you hook up your iPad using a USB. Then you sync to iTunes. Then you safely unplug your iPad. Then you open iBooks. Then you find the new book in the iBooks "shelf" which you can then finally click on and start reading. As opposed to one click on a link to a free ePub or MOBI book link say at PG using a netbook browser, which downloads the file, stores it, opens the reader app and you are up and reading. 1 second verses 10 minutes of hassle factor. Further, the iPad manuals say that Apple has *permanently* given Apple applets priority over other applets for the file types that the iPad supports. IE ePub type is "hardwired" to the iBooks applet such if you transfer an ePub file via the long-winded iTunes USB "sync" process then one can only read that ePub using the iBooks app -- which is a pretty weak app compared to other ePub and MOBI readers if one has made the comparison. [Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and allowed no other browser choice! Can you say "Monopoly," I knew you could] What CAN iPad do? It can reasonably present paid books from Apple on iBooks (not the greatest reader app, but not too horrible either) It can reasonably present a free subset of PG's offerings repackaged as-if they came from Apple on iBooks It can reasonably present paid books from Amazon via Kindle for iPad It can reasonably present a free subset of PG's offerings repackaged as-if they came from Amazon via Kindle for iPad If you have already bought books for a Kindle then Kindle for iPad will also allow you to read them for no additional cost on iPad It can reasonably present PDF books and documents via Safari as long as you have an active wifi connection It can store and allow you to read free ePub and other common document formats that you have transferred to iPad using the slow and cumbersome Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot test it in the Apple store because they don't have USB to desktop set up] Is this all good or bad? It depends on what you want to do. If you simply want to be a passive consumer of content, similar to watching TV from your cable provider, then maybe its fine. If you want to be a CREATOR of content, such as someone who helps DP, SRs books from DP, "solos" books for PG, etc, then it's a pretty weak offering -- IMHO you would be much better off putting up with the hassles of a netbook which DOES allow one to quickly and painlessly transfer content using wifi. And if you are a reader omnivore like I am, then you will probably rapidly get sick of the Job's monopolistic restrictions constantly getting in the way of your ability to quickly and easily download What you want from Where you want reading it with Whatever reader applet YOU damned well choose -- NOT Steve Jobs! Other reasonable approaches: Wait for the HP Slate and see how cobbled-up its touch abilities are. At least it offers a REAL operating system -- why couldn't Apple have offered OS X on iPad ??? Buy a netbook and put up with the keyboard hassles. Buy a Kindle and put up with the crappy web browser and slow-and-unreliable "whispernet" AT&T connection -- at least you get a good built-in reader app and good screen technology. Buy a low-cost generic reader such as Libre Pro Buy an iPod and at least you're admitting you are reading on a cellphone and at least you are actually getting a cellphone--with the resulting compromises in space, speed, and OS. Wait and see if the next version of the OS for iPad is less compromised.

Once again I must insist Mr. Adcock stop putting words in my mouth. You notice he doesn't put his answers in the context I asked them. These products all "work as I suggested," they just NEVER WERE INTENDED TO WORK AS MR. ADCOCK WOULD HAVE WANTED, AND HE KNEW THIS BEFOREHAND. . .nothing new here, just the same old same old. There are plenty of options to read in higher res than iPod/iPhone. That was your original complaint. There are plenty of PG eBooks. Also part of your original complaint. That you want to go behind the counter and rearrange things? Sorry, it's their store, their counter, not at all up to you. Make your own. . . . On Tue, 20 Apr 2010, James Adcock wrote:
If only Jim would have been as thorough, and polite, about the iPad.
I don't know what your problem is, but per your suggestions I went back to the Apple "brick and mortar" store yesterday, and spent an additional three+ hours researching all the iPad suggestions you made. None of them "worked" as you suggested. None of them allow direct access to allow one to read ebook files either ePub or MOBI via wifi access to the internet. None of them allow latest access to the most recent books in ePub format released on PG. What almost all of them do is allow access to an internet server tied to that particular applet that allows one to read some subset of the PG offerings in some degraded form, typically a slightly spruced up "pretty print" version of a text file. This can simply be checked by searching for one of the latest PG books in which case you will find that NONE of these applets offer the latest PG books.
Unlike most full function web browsers on desktops or even netbooks, if one uses the provided Safari web browser to click on an ePub or MOBI file Safari says "You cannot download that file." You CAN use Safari to open a PDF file, which makes iPad useful for reading Google Books PDF "photocopies" of books. Just not useful for what PG offers. Also even IF you use Safari to open a PDF file, apparently Safari does not retain a copy of that book because when I run Safari again it redownloads that PDF file again from scratch. Why does one care? Well, that's slow, and it means that this doesn't work in "airplane mode" ie you cannot read the PDF file on an airplane via Safari with the wifi turned off. The ability to actually download and save book files is a fundamental feature of "real" book readers, and in the design of "real" ebook formats such as ePub and MOBI, such that when you download a book, you HAVE that book, and can then read it wherever and whenever you choose without requiring wifi or other wired connection.
What the Apple manuals say (finally released yesterday and read last night) is that one is allowed to transfer free book files in ePub format via USB cable from your desktop to your iPad via iTunes. You download a free ePub book to your desktop. From there you transfer it to the iTunes software. Then you hook up your iPad using a USB. Then you sync to iTunes. Then you safely unplug your iPad. Then you open iBooks. Then you find the new book in the iBooks "shelf" which you can then finally click on and start reading.
As opposed to one click on a link to a free ePub or MOBI book link say at PG using a netbook browser, which downloads the file, stores it, opens the reader app and you are up and reading. 1 second verses 10 minutes of hassle factor.
Further, the iPad manuals say that Apple has *permanently* given Apple applets priority over other applets for the file types that the iPad supports. IE ePub type is "hardwired" to the iBooks applet such if you transfer an ePub file via the long-winded iTunes USB "sync" process then one can only read that ePub using the iBooks app -- which is a pretty weak app compared to other ePub and MOBI readers if one has made the comparison. [Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and allowed no other browser choice! Can you say "Monopoly," I knew you could]
What CAN iPad do?
It can reasonably present paid books from Apple on iBooks (not the greatest reader app, but not too horrible either)
It can reasonably present a free subset of PG's offerings repackaged as-if they came from Apple on iBooks
It can reasonably present paid books from Amazon via Kindle for iPad
It can reasonably present a free subset of PG's offerings repackaged as-if they came from Amazon via Kindle for iPad
If you have already bought books for a Kindle then Kindle for iPad will also allow you to read them for no additional cost on iPad
It can reasonably present PDF books and documents via Safari as long as you have an active wifi connection
It can store and allow you to read free ePub and other common document formats that you have transferred to iPad using the slow and cumbersome Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot test it in the Apple store because they don't have USB to desktop set up]
Is this all good or bad? It depends on what you want to do. If you simply want to be a passive consumer of content, similar to watching TV from your cable provider, then maybe its fine. If you want to be a CREATOR of content, such as someone who helps DP, SRs books from DP, "solos" books for PG, etc, then it's a pretty weak offering -- IMHO you would be much better off putting up with the hassles of a netbook which DOES allow one to quickly and painlessly transfer content using wifi. And if you are a reader omnivore like I am, then you will probably rapidly get sick of the Job's monopolistic restrictions constantly getting in the way of your ability to quickly and easily download What you want from Where you want reading it with Whatever reader applet YOU damned well choose -- NOT Steve Jobs!
Other reasonable approaches:
Wait for the HP Slate and see how cobbled-up its touch abilities are. At least it offers a REAL operating system -- why couldn't Apple have offered OS X on iPad ???
Buy a netbook and put up with the keyboard hassles.
Buy a Kindle and put up with the crappy web browser and slow-and-unreliable "whispernet" AT&T connection -- at least you get a good built-in reader app and good screen technology.
Buy a low-cost generic reader such as Libre Pro
Buy an iPod and at least you're admitting you are reading on a cellphone and at least you are actually getting a cellphone--with the resulting compromises in space, speed, and OS.
Wait and see if the next version of the OS for iPad is less compromised.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 4/19/2010 12:25 PM, Jim Adcock wrote:
In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion."
You make an interesting thesis, which, rare in the case of DP/PG arguments, is eminently testable. I have done so, and you clearly have not. Take a PG HTML zip file, say "76" for the sake of completeness. Download it, and unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for concreteness. Download it, and unpack it on your computer.
Now, look at the contents.
Do they have the same HTML files?
Yes, they do. The file names have been altered, but the content is virtually the same. [snip]
Do the have the same number of HTML files?
Yes they do. Each has eight parts plus the godawful and legally unnecessary PG header (Apple is doing the world a favor by stripping it away. [snip]
Are the contents of the HTML files identical?
No they are not.
No, they are not. Mr. Perathoner's files 1.) have been converted from ISO-8859 to Unicode/UTF-8; 2.) have extracted the internal style sheets into external style sheets; 3.) have added a links to a "center contents pages" and generic "pgepub" stylesheet; 4.) have added "id" attributes for use by .epub user agents for navigation; and 5.) have changed all the internal links to match the file paths inside his archive. All of these steps, except #3, are harmless and do not affect the presentation of the content. Indeed, with the exception of centering the tables they are probably all desirable things to do.
For the sake of completeness, open the first HTML file of each. Do the files RENDER the same on your browser when you actually TRY them to see if your thesis is correct?
No they do not RENDER the same.
First of all, it is your thesis not mine. I rarely, if ever, download files from PG; instead I get them from some other source where the quality of the files has more importance. But you are correct, with an unaltered archive they do /not/ render the same. However, if you delete the "pgepub.css" file, or delete its contents, they /do/ render the same with the exception of the centered tables of contents. If you delete all the odd numbered .css files, then they /do/ render identically. This is, of course, exactly why embedding style information inside an HTML file is a bad thing (you can't change the presentation without editing the HTML) and including a link to a generic stylesheet is a good thing (just find the stylesheet you like, copy it over the top of the generic one, and voilà, your book, your way). All of this can be accomplished by using a visual zip tool, and without ever having to edit a file (other than your zipper). Although we definitely need to talk Mr. Perathoner out of adding a link to a "center me" style sheet.
It is an interesting thesis that PG epub files are "just" a zipped version of the PG HTML files -- but it is an easily demonstrably false thesis.
I never said that /PG/ .epub files are just a zipped version of /PG/ HTML files; I said that technically conforming .epub files are just zipped versions of their source HTML files. It is certainly possible to take an HTML file, alter it, and make an .epub file from the newly altered file. Personally, I would view that as a flaw in the conversion software, though, and independent of the issue of .epub encapsulation.
Marcello's epub software does more than "just" pack the HTML files into an epub package. Ask him for a copy of his converter software, and see what the conversion actually entails. And/or ask Marcello what conversions he actually does to move from the HTML version to the epub version.
True. Apparently, Mr. Perathoner's software extracts embedded CSS information and moves it to an external style sheet (as it should), creates a "<div class='c1'>" around the tables of contents and illustrations, with a corresponding style sheet that centers the contents (which it should not), and adds a link to a generic "pgepub" style sheet (as it should), in addition to altering names for navigation purposes. Now apparently, your complaint is not that PG HTML does not make good .epub files, or that including a generic stylesheet "breaks" the ".epub", but that you don't like the .epub generator that Mr. Perathoner wrote. That complaint, with which I sympathize, needs to be directed to him individually; it cannot, however, be generalized to /all/ .epub files, only those created by his software.
Thus again, I suggest that it would be a good idea to have a portable version of Marcello's epub conversion software that we could use for testing on our local machines. Given a portable version of the epub conversion software going to mobi is easy using the same Amazon/Mobipocket provided epub->mobi conversion software that Marcello is already using.

Lee Passey wrote:
creates a "<div class='c1'>" around the tables of contents and illustrations, with a corresponding style sheet that centers the contents (which it should not),
HTML Tidy does that. Direct your complaints to the w3c. -- Marcello Perathoner webmaster@gutenberg.org

On 4/20/2010 5:10 AM, Marcello Perathoner wrote:
Lee Passey wrote:
creates a "<div class='c1'>" around the tables of contents and illustrations, with a corresponding style sheet that centers the contents (which it should not),
HTML Tidy does that.
You are correct. There is apparently a disconnect between the official HTML specification for the "<center>" element and the implementation on all major browser. For informational purposes, I have CC'ed this list with my message to the Tidy developers list on SourceForge. Until I get the matter resolved, I would recommend you /not/ use the --clean option with Tidy.
Direct your complaints to the w3c.
Why? They wouldn't and couldn't do anything about it. Tidy was developed by a member of the W3C, but he has long since abandoned any involvement with the project. Today, the Tidy project is an independent project based at http://www.sourceforge.net/projects/tidy. If you come across a bug in Tidy (or wish an enhancement), please log it at http://sourceforge.net/tracker/?group_id=27659&atid=390963.

Now apparently, your complaint is not that PG HTML does not make good .epub files, or that including a generic stylesheet "breaks" the ".epub", but that you don't like the .epub generator that Mr. Perathoner wrote. That complaint, with which I sympathize, needs to be directed to him individually; it cannot, however, be generalized to /all/ .epub files, only those created by his software.
First, it should be obvious to all the PG ePub is NOT simply HTML repackaged and compressed in that PG ePub is offered in two flavors, with and without "illustrations" and if those "illustrations" are illuminated caps then that is going to have at least SOME impact on the ePub files generated and the enjoyment or lack thereof of the end reader! My *complaint* rather was that YOU said it was not necessary to have access to Marcello's converter because I could easily create my own ePub files to see what my HTML would like as an ePub. Which was clearly false. My *suggestion* after *others* at PG complained that DP keeps turning out HTML which breaks when turned into PG ePub files was that maybe PG ought to offer Marcello's converter software in a portable form (I tried porting it but can't get it to work) so that DP authors (PP's) can actually TRY the ePub format as part of their content development process, and perhaps IF they saw for themselves that they were making choices in their HTML cutesiness that is causing the ebook readers experience to fail THEN perhaps they would make better choices. BUT, currently the only way to see how the ePubs or MOBI is going to turn out is to submit the completed HTML to PG for posting at which point in time its way too late to make more reasoned HTML design tradeoffs.

Summary of the situation (as it seems to me). DP is currently taking too long to produce texts that are are either less (plain-text) or more (DP-style HTML) than the supply chain is able to convey to the end-readers to deliver the experience intended. Once DP delivers their content in one or both of those formats, it's for all purposes stuck at PG (nicely symmetrical with how it had previously been stuck at DP) because while DP had the raw materials but no finished goods, PG has the finished goods but no raw materials. So for whatever purpose (quality improvement, error correction, evolving requirements) PG's products grow stale. What can DP do in the reasonable short-term future that would be low risk and low effort? The first most obvious to me is to start getting serious about passing along the raw materials. Upload in as complete form as possible the matching image and text files so future modification and adaptation is possible. There's no loss to DP by doing so; and the risk is that over time they are quite capable of losing track of them.

The first most obvious to me is to start getting serious about passing along the raw materials. Upload in as complete form as possible the matching image and text files so future modification and adaptation is possible. There's no loss to DP by doing so; and the risk is that over time they are quite capable of losing track of them.
I suggest that it is helpful if possible for the HTML to be submitted with the linebreaks the same as the original book, and that PG retain those linebreaks rather than changing the line lengths by say running the HTML through "tidy." Or else at least retain the submitted HTML internally with the original linebreaks to make it easier to fix problems or make another pass through DP or some other process some day. Pgdiff can be used to recover the linebreaks, but it is less work if the linebreaks are never discarded in the first place.

Now that hardware replacements and follow up tasks are mostly complete for the DP production server, I'm taking a moment to at least partially respond to several comments and remarks recently brought up regarding archival materials for projects completed at DP. don kretz wrote on 2010-04-20 at 15:35
Upload in as complete form as possible the matching image and text files so future modification and adaptation is possible. There's no loss to DP by doing so; and the risk is that over time they are quite capable of losing track of them.
I find your lack of faith disturbing. :) The DP test server is an Internet Archive machine and is backed up within their infrastructure. I also personally maintain a remote backup of all archived project files to dedicated storage here. Having said that, some of the earliest DP material (produced on the server in charlz's garage) is not archived in the OLS. This gap in the archives encompasses 422 known projects. At last word from charlz, he has this material, but unfortunately has not yet provided it to be incorporated into the archives. bowerbird wrote on 2010-04-20 at 20:35
first, it looks like i was wrong when i said that d.p. had stopped maintaining the "ols", so of course my "reason" for their having stopped maintaining it was also incorrect. (or one could say it's _no_longer_ correct, but i do believe it was correct at one time.)
charlz was the original and sole maintainer of the DP archives. When he ended his active participation with DP, the archives were unmaintained until I made time to reconstruct the undocumented procedures for moving project files over and recording them in the database. Since then, they have been continuously maintained and the current process documented for the benefit of future caretakers. don kretz wrote on 2010-04-20 at 22:12
Available Formats: Display of images from this source has not been permitted.
DP abides by the stated wishes of the image sources with respect to public redisplay of images from various sources. For sources which do not wish images from their efforts redistributed, the files from DP are retained in the archives, but are not 'made available' in accordance with these agreements. Kevin Pulliam wrote on 2010-04-21 at 00:07
On the Open Library System, I note that high resolution gray-scale scans (at least for the one project I checked) are not archived, ... I also note that there is no 'bulk' download function to get a zip of all the files associated with a text.
The hires scans are archived, however the OLS code and UI are feature-poor. Availability of hires and zip file sets are among the desired features, but development of the OLS is not currently a priority item. David (donovan)
participants (13)
-
D Garcia
-
David Starner
-
don kretz
-
Greg Newby
-
James Adcock
-
Jim Adcock
-
Julia C. Miller
-
Karen Lofstrom
-
Karl Eichwalder
-
Lee Passey
-
Marcello Perathoner
-
Michael S. Hart
-
traverso@posso.dm.unipi.it