
There are conventions for identifying these artifacts in an html document, but there is nothing incorporated into HTML that declares unambiguously that (for instance) an H2 with a class of "chapter-head" is the way a given document does it. Nor is there an implicit way to declare such a structure as metadata. All we do is choose one of the ways we saw someone else do it, or make up another way ourselves. But it doesn't inhere to HTML, nor is it particularly easier than any other form of representation. It's equivalent to saying "It's a new chapter when there are four blank lines, and the first paragraph is the chapter title" in a text document. And including images by reference with a url is an explicit admission that it stands outside the HTML structure. There's no assurance that the reference is even available or legitimate if that document were copied elsewhere. And there's no way within the bounds of (X)HTML standards to provide such a capability. Certainly we can't publish or even describe an API that would provide unambiguous text and metadata sufficient to construct a properly structured ebook in any other representation than the one in which it is stored. And among the storage formats we provide from which such information might be inferred, it seems to be in most cases the plain-text format that is most accessible. At least whatever conventions there are seem to be more consistently adhered to.