I vaguely remember a thread somewhere discussing the fact that some of the e-readers not handling anything beyond latin-1 characters and therefore calibre tried to deprecate certain code points. Might want to follow up as a bug in the calibre forums.

Sent via DROID on Verizon Wireless

-----Original message-----

From: Jim Adcock <jimad@msn.com>
To: 'Project Gutenberg Volunteer Discussion' <gutvol-d@lists.pglaf.org>
Sent: Mon, Feb 7, 2011 17:11:07 GMT+00:00
Subject: [gutvol-d] Re: Calibre: Open Source Software for Managing eBook Collections

>I have found out works great if the input format has a fair amount of info to work with ... ie, mobi or epub. Failure trends to happen more often when the input format is text.or pdf.

Sorry, let me show y'all what I am talking about. I put a bunch of example files up at:

http://www.freekindlebooks.org/Dev/compare

The file names are intended to be indicative of the format translations, and the tools used to transform them:

html2html.html is the null-op transformation of html to html using file copy -- ie this is the source file I am using.

khtml2mobi.mobi is using Kindlegen to transform html to mobi (Kindle format)

chtml2mobi.mobi is using Calibre to transform html to mobi

chtml2epub.epub is using Calibre to transform html to epub

cmobi2epub.epub is using Calibre to transform mobi to epub

Looking at html2html.html, khtml2mobi.mobi, and chtml2mobi.mobi on modern devices [Kindle 3rd Gen, Sony Pocket Reader] and/or desktop emulators show that these files are "doing the right thing" ie they actually display most of the Unicode code points, which is what one would expect from a modern font implementation which implements most of Unicode.

However, displaying on the same devices and/or desktop emulators show that chtml2epub.epub and cmobi2epub.epub doesn't correctly display most of Unicode, but rather substitutes the question mark char '?' NOT even [?] ie question-mark-in-a-box which is the typical display of missing-glyph on Unicode-compatible devices.

Conclusion: Calibre seems to be breaking down re correctly outputting many many [most?] Unicode code points when outputting in epub format. Outputting in mobi format it seems to do better.

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d