
On Wed, February 8, 2012 12:05 pm, James Adcock wrote:
Are *you* planning to re-tag the existing 40,000 texts?
Yup. If you go back and look, that was Mr. Hutchinson's original proposal that I bought in to: 1. Agree to a master format from which all other "user" format can be reliably derived, 2. Develop tool chains which can reliably derive all other "user" formats. 3. Rework PG's existing corpus to comply with the agreed-upon master format. Right now, new work is not on my radar. Creating a new user interface for Distributed Proofreaders to use to create new works in the agreed-upon master format is a worthy goal, but it is not /my/ goal. Most of the value of Project Gutenberg /right now/ is probably not only in the 40,000 existing works it is probably within the first 5000. Finding ways to significantly improve those works is what I want to do. In some cases, it may involve actually replacing the PG texts with new versions created from scratch. I would hope that the improvement process could be made simple enough for volunteers from DP would join me, but an improvement process would be significantly different from what DP does now, so improvements to the DP workflow is of only tangential interest to me.
If not, I suggest a better approach would be to come up with "suggestions" for tags which submitters can use moving forward to be more consistent and to make PG's life simpler moving forward.
I certainly believe that as we move towards a consensus on a master format existing practices should be carefully considered for adoption. If anyone is suggesting starting over there should be a compelling reason to do so (that compelling reason may exist, I just haven't seen it yet). But mere "suggestions" are inadequate. As a programmer, I can't deal with suggestions, I can only deal with rules. I don't care what the rules are, I just need to know.