
Arguably the most valuable text (for content and metadata) is, at best, on someone's PC somewhere. Or more likely discarded.
Just to state the obvious, it sure would be cool if the work could be saved somewhere at that point in time when the work is still "reversible" -- i.e. at a stage where it could be (in theory) resubmitted to DP or similar process for "another pass" or as a foundation for future work 10 years from now when we might have some better and agreed-upon file format for representing books than HTML. Once line break information and page break information has been thrown away, then it is very difficult to go back and make another pass on a book -- although one ought to be able to write a tool that would re-insert line breaks and page breaks based on OCR alignment with the PG or DP text. Here's a simple "real world example" of why one might care: I submit a work to PG including careful HTML representation of how the real author and/or publisher represented their work, including "correct" block text quotes and poetry representation. A day later it shows up on a different site, [which is fine] now represented in MOBI file format, but with the "correct" block text quotes and poetry representation now trashed. Why? Because presumably the person doing the file format translations at this other site is using a tool that doesn't know how to "correctly" deal with the HTML representation of block quotes and poetry. And WHY does that tool not know how to deal correctly with block quotes and poetry? -- Because there IS no format in HTML which says "This is a block quote" or "This is poetry" which in turn means it's a crap shoot whether a given translation tool will handle these issues "correctly" or not. Why would a reader then choose this "inferior" MOBI version from another site? Because that site correctly fills in "Spine" information that the PG version is missing. But if the user chooses the version with the correct "Spine" information, then the block quote and poetry formatting is trashed... ...Of course, presumably there are people at PG who consider issues of block quotes and poetry "just formatting"....