
... 1. obtain the text (via scanners, and o.c.r. software) 2. clean the text (the first set of tasks for our tools) 3. turn the text into e-books (the second set of tasks) ... more to the point, though, was to list the tasks there: 2a. do a spellcheck 2b. fix spacey punctuation 2c. restore styling, e.g., italics ... it is this backdrop of knowing the tasks that need to be done which allows me to rail loudly about the inefficiencies of d.p.
If one is going to rail against the inefficiencies of DP then one should at least be aware of those areas where they are already -- at least in theory -- more efficient than what BB proposes. Namely at least DP is aware of the importance of capturing and retaining italics and bold during scanning. And at least in theory they understand the importance of getting a first-quality OCR in the first place -- as opposed to simply grabbing an OCR off of archive.org Part and parcel of "efficiency" is not having to find and fix things that you have broken needlessly earlier in the processing chain.