
On 2012-10-04 21:12, Marcello Perathoner wrote:
On 10/03/2012 10:19 PM, Jeroen Hellingman wrote:
There are long discussions on the PGDP boards about post processing for ePub, but basically, I refuse to dumb-down my HTML for the sake of limited ePub readers,...
This is the very attitude that damages PG:
People, that think that 'their' HTML is superior to everybody else's and don't understand that the value of a big collection (vs. a single text) lies in its homogeneity and device independence.
I understand very well that a big collection of text in a single, well thought out master format adds considerably to its usefulness, however, I think HTML is a particular bad choice for such a master format, as it is (if not by design, by historical accident) far more a presentation format than a semantic format. The big funded university text projects mostly go for TEI for the very same reasons I've been using it for over 15 years now.
This attitude is much worse than the bogus copyrights that some publishers prefix to PD texts, in so far as you can easily remove those copyrights, but you cannot easily remove that complex vanity HTML that DP producers love oh so well.
That is nonsense: a fake copyright claim is extremely hard to remove, as you need to find an obviously public domain source, and do a detailed compare of the file with the falsely copyrighted one; while half an hour of Perl hacking is normally enough to tame even the most wildly decorated HTML mess (Word generated HTML anyone?)
With your TEI workflow you could easily submit simple and interoperable HTML to PG, and complex vanity HTML to your own site.
That is correct, and in-fact, when generating ePub, I already dump-down the HTML that goes inside them. However, it makes no sense to dump-down a version where everybody has the capability to enjoy the added value you can add with HTML. Jeroen.