Amazon releases new KF8 and MOBI tools

Amazon has finally released their "new improved less crappy" tools which support as output format KF8 and MOBI. As input format they accept HTML5 and CSS3, not that that will help us submitters because PG refuses to accept HTML5. Hopefully PG will at least pick up the new version of kindlegen so they can turn back off some of the Marcello file rewriting. The new version of kindlegen: www.amazon.com/kindleformat/KindleGen Improved simulator software: <http://www.amazon.com/kindleformat/KindlePreviewer> www.amazon.com/kindleformat/KindlePreviewer

James Adcock said: PG refuses to accept HTML5 For the record, it was only this afternoon that it was made quite clear to him, by myself and Greg Newby, that PG will probably eventually accept HTML5 files, but not, repeat Not, repeat NOT (my emphasis), until it becomes a W3C Recommendation (it's currently a W3C Working Draft, subject to change), and the W3C validator accepts HTML5 files without errors or warnings. Al -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of James Adcock Sent: Thursday, January 12, 2012 5:14 PM To: 'Project Gutenberg Volunteer Discussion' Subject: [gutvol-d] Amazon releases new KF8 and MOBI tools Amazon has finally released their "new improved less crappy" tools which support as output format KF8 and MOBI. As input format they accept HTML5 and CSS3, not that that will help us submitters because PG refuses to accept HTML5. Hopefully PG will at least pick up the new version of kindlegen so they can turn back off some of the Marcello file rewriting. The new version of kindlegen: www.amazon.com/kindleformat/KindleGen Improved simulator software: www.amazon.com/kindleformat/KindlePreviewer

For the record, it was the day before that when I tried to submit HTML5, and Al Haines refused it, just as I said PG refuses HTML5 -- so I don't see where the controvery is? And since, as I said previously, the new Amazon tools expect HTML5, so that won't be helping us submitters any. It presumably could help Marcello, who can target HTML5 as the output of his HTML Epubmaker rewrite tools, where hopefully he can now turn off some of the other rewriting "workarounds" he has been doing to target MOBI. The new tools would make it a lot easier for submitters who are writing (sensible) HTML intended to actually target EPUB and MOBI to have a better chance of "just" targeting EPUB and then getting MOBI "for free." Or is Marcello going to insist on staying with the old crusty version of kindlegen as well? If someone, like I did, now downgrades their HTML5 to 4 in order to get PG to accept it, then what? Do "we"/I keep a copy of my HTML5 version in hand, and then do a hypothetical merge on the then-current PG version when PG gets around to it to actually accepting HTML5? Or presumably PG will refuse to do that, and continue their current policy of defending to the death old crusty versions of HTML of otherwise great books which, because they are old crusty versions, really don't do anyone any good any more? And presumably this means the "real" development path for people who actually care about creating HTML that real people can read on real machines is that we get to write in HTML5, try that HTML5 out on real machines using EPUB and MOBI compilers (like kindlegen) and then once we have working HTML5 then we get to hack it back down to 4 to submit it to PG, and then we get to hack it back down yet again, to turn it into PG txt72. ??? Al> For the record, it was only this afternoon that it was made quite clear to him, by myself and Greg Newby, that PG will probably eventually accept HTML5 files, but not, repeat Not, repeat NOT (my emphasis), until it becomes a W3C Recommendation (it's currently a W3C Working Draft, subject to change), and the W3C validator accepts HTML5 files without errors or warnings. [The W3C issued warnings about old browser compatibility because I included a BOM as a file transfer courtesy to PG, since I have no idea what kind of computer nor tools they process their files with. I would have thought that PG would have just stripped the BOM if they don't like BOMs] What Jim Said>As input format they accept HTML5 and CSS3, not that that will help us submitters because PG refuses to accept HTML5. Hopefully PG will at least pick up the new version of kindlegen so they can turn back off some of the Marcello file rewriting.

On 1/12/2012 6:14 PM, James Adcock wrote:
As input format they accept HTML5 and CSS3, not that that will help us submitters because PG refuses to accept HTML5.
But what is the internal format for the Kindle and MOBI? Does the Kindle or the Fire support /any/ of the new features of HTML 5? We all know that internally the Kindle supports only a slight variation of HTML 3.2. It's fairly easy to write a program to "dumb-down" HTML 5 to HTML 3.2+, which is all the new KindleGen program is doing. Why should PG bother with HTML 5 as a Kindle input format when it's all going to get stripped away by KindleGen anyway?

But what is the internal format for the Kindle and MOBI? Does the Kindle or
the Fire support /any/ of the new features of HTML 5? What I think "we" are talking about is heading towards XHMTL5, which is compatible with KF8 (Mobi Version 8) and which will be compatible with EPUB3. By writing code that the tools can accept "directly" and which display correctly on current browsers, then we can reduce the amount of rewriting that Marcello has to do. Or that other tools have to do. And hopefully start heading in a direction where the EPUB and the MOBI that PG posts renders much more similar to how the HTML that PG posts renders, with the PG EPUB and the PG MOBI showing much less frequent "broken parts" -- which happens more often than not nowadays. New HTML5 Elements which seem useful to the PG cause which Amazon says the new Kindlegen (and KFire) supports: wbr section article aside hgroup header footer figure figcaption mark I haven't done an "all points test" yet on the new kindlegen to see how it actually implements this stuff. See http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000729901
We all know that internally the Kindle supports only a slight variation of HTML 3.2.
Not sure what "we all know." Are you saying Kindle Fire is only internally HTML 3.2? Are you saying KF8 is only internally HTML 3.2? Agreed that the new kindlegen *does not* seem to "magically" fix all the limitations of the older generations of Kindle. Most notably the paragraph top-margin/bottom-margin problem which keeps killing so many of the PG MOBI files remains on older Kindles even using the new kindlegen. See also: http://www.w3.org/TR/html5-diff/

On 1/13/2012 6:48 PM, James Adcock wrote:
We all know that internally the Kindle supports only a slight variation of HTML 3.2.
Not sure what "we all know." Are you saying Kindle Fire is only internally HTML 3.2? Are you saying KF8 is only internally HTML 3.2?
Absolutely not. In fact, I disclaim any knowledge at all about the software on the Kindle Fire or the KF8 file format. I'm sure the community over at MobileRead will soon have it figured out, but so far I'm not pursuing that knowledge. I will admit that the Fire hardware seems very attractive at its price point, and now that procedures for "rooting" the device are well-established I will probably be buying one in the near future, and hope that I will be able to gain more knowledge about the format as time goes on. So, what /do/ we know. First and foremost, we know that the limiting factor for the Kindles of all generations in display any particular format is /not/ the file format, but the installed software. /No/ improvement to KindleGen can or will create new capabilities that the installed reader software does not already support. We know that Amazon has published documentation for exactly what markup the pre-Fire Kindle reader supports (see, http://kindlegen.s3.amazonaws.com/AmazonKindlePublishingGuidelines.pdf). Examination of that document reveals that the supported HTML tags for the pre-Fire devices are almost a perfect match for HTML 3.2 (Especially interesting is http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=TagRef_OEB.htm, which is referenced by the Kindle Publishing Guidelines). We know that the pre-Fire Kindle reader does not support HTML 4 tags, or CSS. "But wait," you say. "The Kindle Publishing Guidelines specifically says that Kindles support some (not all) CSS and the <em> tag which was not a part of HTML 3.2! How can you say you know what you claim to know?" Good question. (It's easy to ask good questions when someone else is putting words in your mouth :-)). I started with a fairly simple HTML file which nonetheless contained a CSS style sheet, inline styles, and some HTML 4 tags. I used KindleGen to create a .mobi file from that HTML. I then used the prc2html program I wrote to extract the HTML back from the .mobi file, and examined the resulting HTML source. What I discovered was that all CSS styles that the Publishing Guidelines claimed are supported by the pre-Fire Kindle had been converted to HTML 3.2 tags (e.g. <p align="center"> and <p style="text-align:center"> were converted to <center>) and HTML 4 tags had been converted to their HTML 3.2 counterparts (e.g. <em> became <i>). Inline references to internal style sheets were ignored (e.g. <p class="center">). Just to be explicit, when I say converted I mean replaced; the new markup was inserted and the old markup removed. Interestingly, unsupported inline CSS styles and other attributes were left in place, as was the entire internal style sheet (that section between <style> and </style>). Another thing that is well known is that the .mobi format is little more that an HTML file in a MobiPocket wrapper. I have been universally successful when taking an HTML file, converting them to PalmDOC, rename the .pdb to .prc and opening them in the Kindle reader. I took the simple HTML file that I started the experiment with and converted it to .prc using the foregoing procedure. When I loaded my pseudo Kindle file into the Kindle reader (for PC, as I don't own the hardware) I discovered that 3.2 markup was rendered, but 4.0 markup and styles were ignored. Interestingly, the Kindle DXG boasted a web browser, but it obviously used the old MobiPocket rendering engine, because even the web browser was limited to HTML 3.2 markup. Almost immediately upon release the DX web browser was widely panned and dismissed. Now I'm going to engage in some speculation. I suspect that somewhere in the 2nd generation time frame Amazon engineers realized that the simple 3.2-based MobiPocket engine was obsolete, and began working on a new HTML display engine (it's possible, even likely, that when Amazon bought MobiPocket they bought it as a business unit, and the MobiPocket engineers are responsible for the ongoing Kindle reader development and maintenance. The 4th generation Kindles probably include this new, more powerful and compliant engine. This new engine probably supports most, if not all, of the HTML 4.01 specification. If I were writing new e-reader software for an Android device, I would probably adopt the WebKit engine for HTML rendering. I don't believe that is what the Amazon engineers did. I think they probably ported the latest generation Kindle reader software to Android. One of the most fundamental conclusions that should be apparent from all of this discussion is that the determining factor in what is supported, or not, is the e-book rendering software, not the e-book generating software. So the question is not really whether KindleGen supports HTML 5, but whether the Fire /rendering/ software supports HTML 5. It's possible that the software does /not/ support HTML 5, but KindleGen is smart enough to convert HTML 5 to HTML 4.01. It's also possible that the Fire (via the KF8 file format) supports HTML 5, but earlier versions of KindleGen was stripping those "unknown" tags out, and now it's leaving them in. I also think it's possible, indeed likely, that with the Fire Amazon has a method to upgrade the Kindle software "behind the scenes" without the users even knowing that an upgrade occurred. In this case, older Fires may be in the process of upgrading to HTML 5, and some change to KindleGen was required to support the upgrade.
Agreed that the new kindlegen *does not* seem to "magically" fix all the limitations of the older generations of Kindle. Most notably the paragraph top-margin/bottom-margin problem which keeps killing so many of the PG MOBI files remains on older Kindles even using the new kindlegen.
Yes, KindleGen doesn't fix the limitations of earlier software, and indeed cannot. It /can/ modify/convert the input in such a way that older software can deal with it, but KindleGen can't make a silk purse out of a sow's ear. I think that the answer is to try and save your files using whatever markup will give you the highest fidelity possible, and then downgrade from there is necessary. As to the issue of acceptance of files at Project Gutenberg, in my experience there is no such restriction on posting files at the Internet Archive. My advice would be to post the good stuff there, and then give PG just the impoverished text, with a transcriber's not that says better stuff is available at IA, with a URL.

Lee>We know that the pre-Fire Kindle reader does not support HTML 4 tags, or CSS. Sorry, but I don't know exactly what your prc2html program does. "We" know from mobi_unpack.py and copious net conversations that kf8 actually contains two separate file formats in one: a copy of the oldy moldy "mobi7" file format targeting "pre-current-generation" Kindles [*not* pre-Fire Kindles] and a "mobi8" file format targeting "current-generation" Kindles. Amazon's cloud server supposedly cuts this down on-the-fly and is smart enough to only send that part which is needed by a particular Kindle -- if the kf8 file gets served by Amazon from Amazon. [Don't know what happens if the kf8 gets served *through* Amazon by say a PG customer sending a hypothetical PG kf8 *through* Amazon using say the Amazon "Send to Kindle" desktop program.] Now of the "current-generation" Kindles the reality is only Kindle Fire *currently* has the software to render "mobi8" but Amazon claims they are busily churning away trying to get the updates out to allow "current-generation" Kindles to also render "mobi8" -- but Amazon has not been explicitly explicit about exactly what Kindle models they mean when they say "current-generation" Kindles. Does your prc2html software successfully extract *both* these file formats, or only the "mobi7" part? Now one problem for PG I would think with the "two, two, two formats in one" kf8 packing is that the currently already bloated mobi7 format (assuming one turns off compression, which I think is what PG is currently doing) becomes an even more bloated up (uncompressed) kf8.
Yes, KindleGen doesn't fix the limitations of earlier software, and indeed cannot. It /can/ modify/convert the input in such a way that older software can deal with it, but KindleGen can't make a silk purse out of a sow's ear.
Well, without intended insult but just stating the reality, that's what on some level Marcello's rewrite software attempts to do. There are at least three sources of cruftiness here: 1) The oldy-moldy mobi7 format 2) The oldy-moldy "html" files on PG that PG in practice makes impossible to update -- even for their most popular books 3) Current html writing practice of many people and groups submitting html to PG that is continuing to be written in a needlessly thoughtless way that prevents that html from displaying well on a large portion of PG's customers' devices even though they would only have to change a couple statements in their CSS slightly to make the situation much much better.

I am annoyed to see that Kindle Previewer is still not available for Linux. This annoys me because the program is written in Java using SWT so it should be trivial to create a Linux version. I end up running it under WINE, which works, but it runs very slowly. It really is only good for doing a quick check. Fortunately I have an actual Kindle for serious proofing. James Simmons On Thu, Jan 12, 2012 at 7:14 PM, James Adcock <jimad@msn.com> wrote:
Amazon has finally released their “new improved less crappy” tools which support as output format KF8 and MOBI.****
** **
As input format they accept HTML5 and CSS3, not that that will help us submitters because PG refuses to accept HTML5. Hopefully PG will at least pick up the new version of kindlegen so they can turn back off some of the Marcello file rewriting.****
** **
The new version of kindlegen:****
www.amazon.com/kindleformat/KindleGen****
** **
Improved simulator software:****
www.amazon.com/kindleformat/KindlePreviewer****
** **
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d
participants (5)
-
Al Haines
-
James Adcock
-
James Simmons
-
Jim Adcock
-
Lee Passey