There are conventions for identifying these artifacts in an html document, but there is
nothing incorporated into HTML that declares unambiguously that (for instance) an
H2 with a class of "chapter-head" is the way a given document does it. Nor is there
an implicit way to declare such a structure as metadata.

All we do is choose one of the ways we saw someone else do it, or make up another
way ourselves. But it doesn't inhere to HTML, nor is it particularly easier than any
other form of representation. It's equivalent to saying "It's a new chapter when there
are four blank lines, and the first paragraph is the chapter title" in a text document.

And including images by reference with a url is an explicit admission that it stands
outside the HTML structure. There's no assurance that the reference is even available
or legitimate if that document were copied elsewhere.

And there's no way within the bounds of (X)HTML standards to provide such a capability.

Certainly we can't publish or even describe an API that would provide unambiguous
text and metadata sufficient to construct a properly structured ebook in any other
representation than the one in which it is stored. And among the storage formats
we provide from which such information might be inferred, it seems to be in most
cases the plain-text format that is most accessible. At least whatever conventions
there are seem to be more consistently adhered to.

>