On Wed, February 8, 2012 1:03 pm, Jimmy O'Regan wrote:
That's not so much a problem in converting HTML to EPUB -- most EPUB files are just HTML in a zip file, with some metadata -- the problem is inferring this metadata from HTML. Inferring semantic information of any kind from presentation-level details is, at best, unreliable.
One of the first things that Ms. Lofstrom suggested in response to Mr. Hutchinson's original proposal was updates to the metadata associated with a text. I would think that any automated generation process should be extracting the metadata associated with a text - and if the metadata is incorrect in the resulting file then obviously we need to improve the master metadata. And I think this raises an interesting issue: not only do we need a master document format, we also need a master /metadata/ format.