
Quoting David Starner <prosfilaes@gmail.com>:
I don't see any value in that. Scans are a pain, and the only saving grace is that they accurately represent a physical printed edition.
Which is not true; many scans (or the compression techniques used to compress them to manageable size) loose relevant information. Infamous in PGDP circles are "despeckled periods." But I have compression techniques based on OCR techniques mix up e and c, just as OCR software often does. However, having them is often the best we have (besides having the paper original), and I would love to have them.
I would define "master format" as one that can be effectively used to derive other formats from.
As for universal, the image is not necessarily capable of encoding all formatting nuances. For properties of physical books, scans can't handle transparencies (which I've had in a book I was thinking about scanning)
Did that, just scan with a neutral background behind the page, and repeat for each layer of transparency, then you can use tricks in HTML to reproduce the effect. (using the CSS hover pseudo-selector to show the transparent sheet superimposed over the original illustration did the trick for me)
, paper changes, metallic inks (one illustration had an EETS book printed for it, but the metallic ink didn't reproduce well in my scans), fur (plenty of 20th/21st century examples for babies),
Often a problem with nice gold-embossed covers as well...
mirrors (ditto), holes (again, ditto), or pop-ups (there's a beautiful 19th century edition of Euclid with them, for example).
I once did a book on the female body, with a very nice pop-up (http://www.gutenberg.org/files/22868/22868-h/22868-h.htm#d0e1934) Just scan in plenty of states, but would require some 3D modelling language to describe in the general case.
For properties of the text, it does not show line breaks at tops of pages (ambiguous in poetry) and it obscures spellings at end of lines (is it spelled to-night or tonight?). It's powerful, but not unlimited.
Common issue, look for other occurrences of to-?night, and pick one. Hardly troublesome in most cases. If you really desire, you can capture these using entity encodings like ‐ &softhyphen; &dubioushyphen; in your version of TEI or whatever you are using.
Anything that requires you to look at every word on every page, which italics and bold do, is time-consuming. And perhaps as important as italics is superscript; 10<sup>30</sup> changes quite a bit when it's comes back as 1030.
That is why it is important to get those things right the first time, when we are directing PGDP volunteer eyeballs on it. Jeroen.