
On Fri, February 3, 2012 1:44 pm, don kretz wrote:
If you have floats, you can use inset page numbers with spans and an appropriate stylesheet.
If you have floats ...
But fundamentally what you're running into, and I don't know how you avoid it if you insist on XML/XHTML, is that books simply aren't well-formed in the way that XML defines and requires. You can't embed everything 100% in all its containers.
You've made this assertion before. I don't agree with it, and I've yet to see any examples or evidence that it's true. It's obvious that the eggheads who came up with TEI seem to think you can, and from what I've observed even though they're eggheads they're not techies; they're more like linguists and English professors. I believe in climate change. Not because this has been a particular warm winter (it has been) but because virtually every climate scientist on the planet says it's happening. I believe in TEI as a text encoding standard. Not because I have fully tested or exercised it, but because some really smart and really educated people put it together. With all due respect, I don't think anyone here can come close to designing a system as good as what they developed; we don't have the expertise, and we haven't had the time. Now about 5 years ago (my reports are in the list archives in the 2006-2007 range) I did some testing about TEI and XHTML. The results of that testing demonstrated that I could do "round-trip" conversions between TEI and XHTML; that is, I took a TEI file and programmatically converted it to HTML (HTML that displayed well on a browser without CSS) and back to TEI. Thus, I can conclude that there is nothing in TEI that cannot be encoded in XHTML. While TEI is "best-of-breed" it is not ubiquitous. If a volunteer learns TEI to create texts for PG, that skill is not necessarily transferable; but if I know XHTML I can publish web pages, or blog, or do any other job for which XHTML is the standard. Thus, on the whole I think that appropriately constrained XHTML is be best practical choice, even if not the best technical choice.
Page numbers are a reflection of this problem, because conceptually they are boundaries between page elements. But page elements simply aren't well-formed because their tops and bottoms can cut right through paragraphs (and everything within which paragraphs are embedded.)
I fail to see how this example proves your point. For example, assume a paragraph which is split over a page. You've started your <p> container and start throwing your phrasing content into the container. Then, in the middle of your text you encounter some metadata; this metadata just happens to be an indication that the current physical manifestation is changing, that the nature of the metadata is that it's a page number, and that the actual metadata is "217." Drop a hidden metadata object into the phrasing content at the point where it exists (in this case, an <a> tag) and go on. Now, if you're trying to represent /pages/ as XML container objects then you do have a problem, because pages and paragraphs are probably not contiguous. But the document structure (paragaphs) and the manifestation structure (paged book) are totally different paradigms, and can't be represented as the same structural object.
I think what happened to some devices is that they had to decide between supporting HTML and XHTML, and since writers can't be constrained to create well-formed XML documents (nor should they be), the devices had to choose HTML.
I believe that virtually all devices (and perhaps even absolutely all devices) require XHTML. This is certainly a requirement of ePub. If you try to load SGML/HTML onto Adobe's Digital Editions it will blow chunks, and refuse to display anything. Because the Kindle is based on the old HTML 3.2 spec it may not /require/ XHTML but it certainly /accepts/ XHTML. The KindleGen program /may/ require XTHML; I don't know, Mr. Adcock is in a much better position than I to evaluate that question. In any event, requiring XHTML as a master format will certainly have no adverse effects.