
(I have a hunch I'm going to be quoting this message a lot in the future...) On Tue, January 24, 2012 3:08 pm, Joshua Hutchinson wrote:
I'd love to see the PG corpus redone as a "master format" system (and the current filesystem supports "old" format files in a subdirectory, so if someone wanted to get the old original hand-made files, they could). I'm not particularly wedded to any master format. Hell, if someone came up with a sufficiently constrained HTML vocabulary that could be easily used to "generate" the additional formats necessary, I'm good with that.
But before anyone will start doing this work, there needs to be a consensus from PG (I'm looking at you, Greg!) that the work will be acceptable. A half-assed "master format" system is no master format system at all.
On Tue, January 31, 2012 1:22 am, Greg Newby wrote:
The need I'm trying to address is reformatting or editing eBooks, not proofreading them.
Okay, we're on the same page so far...
What I'd like is (as someone else nicely put it) a continual improvement opportunity, provided to essentially anyone, for eBooks in the PG collection.
Still good...
This boils down to a handful of critical activities. It's mainly the third one (III) that involves crowdsourcing and new tools.
This is where we start to diverge...
I. making changes to the master file(s) [let's imagine that we retain the practice of every PG eBook having a small number of master files, in a small number of master formats]. The short list of master formats includes RST, HTML, TeX/TEI, and plain text (perhaps with light markup). Maybe this list will grow in the future; maybe it will shrink.
No, according to Mr. Hutchinson's proposal there can be only one...
The main feature here is that typos or fixes or additional master formats can be contributed.
The main feature here is that a single fix to the master file will automatically propagate to all derived formats; syncing between "masters" will not be required. [little snip]
II. from those master files, various other file formats can be [and are, currently] derived automatically.
Mister Hutchinson's vision, which I am trying to follow, is that /all/ other file formats will be derived automatically from the /one/ master version. Caching is certainly advisable, but on-demand creation would be the first-step.
Many challenges are technical, such as increased sophistication in dealing with text and HTML as master formats.
The primary technical challenge is in developing a tool chain which can produce quality instances of all derived formats, and in adopting/developing a master format with the richness necessary to support that tool chain.
Others need to be addressed by policy or social means, such as the ongoing tendency to use HTML for layout that is difficult to automatically convert.
Policy means include deciding on a master format, developing rules for the use of that format, wide-spread publication of those rules and, to the extent possible, automated means to detect violations of those rules. Social means primarily include getting buy-in from participants to the established rules, and attracting volunteers who are willing to work with them.
III. from those master files, various other file formats that are created/contributed by individuals.
At this point we're not only not on the same page, we're not even in the same book. This suggestion is completely at odds with what Mr. Hutchinson proposed, and which I support. [bigger snip]
If we accept that anyone could contribute such a new file (or set of files) for an existing PG eBook, then the main challenges I see are (a) how to help readers select among them, and (b) dealing with the fact that, over time, master formats will be fixed, but not these hand-crafted derivatives.
I'm not saying you shouldn't pursue this vision; I'm simply saying it's not mine, and I'm completely uninterested in pursuing it with you. My vision is to develop a system where existing PG works can be reworked into a single master format, from which all other formats can be automatically derived. Proof-reading and upgrading the master files is certainly a desirable part of that vision, but it is secondary to the main goal. I'm beginning to think that Mr. Hutchinson's earlier question remains unresolved:
there needs to be a consensus from PG (I'm looking at you, Greg!) that the work will be acceptable. A half-assed "master format" system is no master format system at all.
So Mr. Newby, can we expect some support in building a repository of master format reworkings of existing PG works? Infrastructure support would be nice, but moral support is what is most needed. [big snip]
I hope this helps clarify my original suggestion a little better. There has been some great discussion on this and related topics.
Ditto. Cheers, Lee