
...So what is needed... Yes, except I don't think it's as bad as you make it out to be. TEI and/or PG-TEI could be a good intermediate formal file format. DP markup [and conventions] could be a good preliminary editing markup format. Editing doesn't necessarily need to be WYSIWYG. Input formatted files don't have to be perfect since they are living documents, as opposed to current "write once" output formatted files. Conversion from an input format file to output rendering formats such as txt or html or the various other reflow formats doesn't have to be perfect -- as long as the input format to output format rendering software does more work than the current tools for the job -- which basically is none. You probably have to store CSS or other style choices representation to help reconstruct how the original volunteers chose to render the input file format to the output rendering file format. [Where I am assuming here that html is simply being used as an output rendering file format, so that we don't have to argue anymore about the "correct" semantic use of html -- we would say that the semantics are being represented in the input file format, not in the html] Again, this is all trying to address at least three problems: 1) How do you represent the author's intention without deliberately throwing away information? 2) How do you make the files submitted by volunteers be "living documents" rather than "write once" documents -- which other volunteers can pick up and improve on in the future without having to go back to original scans and rework the work "from scratch" ? 3) How do you support as best as possible various output rendering file formats most appropriate for various reader devices? -- of which PG *already* "officially" recognizes literally about 80 different output file formats of differing complexities!