
According to Bowerbird, Step 3 is possible by simply diffing a fresh OCR of an MS against the extant version: most of the errors highlighted would be in the OCR, but it should also highlight the majority of the errors in the extant text. If this is correct, DP does not need to be involved.
See http://www.freekindlebooks.org/Dev/HuckDiff.txt for an example of these kind of diffs performed on the end results after removing scannos. Note that this is an example of "cross-diffing" in that 76 and 32325 both have provenance and are acknowledged as coming from different editions. When one does such a diff from a new scan one finds not only these "real" differences but also a much larger set of scannos which need to be fixed. This proceeds relatively quickly and easily however [compared to DP], if one has created a PDF-a or DJVU file as part of the OCR process, which allows one to search on the text in question, thereby bringing you directly to that section of the scan image one needs to compare to. I think the most important take-away from this diff is that one should NOT assume that the golden-moldy texts are in good shape, nor should one assume that the passage of time and having hundreds of thousands of people reading a PG offering results in actually correcting the mistakes that are in the golden-moldy texts. It doesn't. On the contrary, the golden-moldies are more likely to be type-ins -- subject to the vagaries of the human mind -- and are more likely to come from texts of lower quality provenance. And these golden-moldies are what customers download and read the most. PS: Take a good hard look at http://www.gutenberg.org/files/76/76-h/76-h.htm if you don't believe these diffs represent "real errors!"