Re: [gutvol-d] a review of some digitization tools -- 002

20 Nov 2011

      ...
...
   1.  obtain the text (via scanners, and o.c.r. software)
   2.  clean the text (the first set of tasks for our tools)
   3.  turn the text into e-books (the second set of tasks)
...
   more to the point, though, was to list the tasks there:
   2a.  do a spellcheck
   2b.  fix spacey punctuation
   2c.  restore styling, e.g., italics
...	
   it is this backdrop of knowing the tasks that need to be done
   which allows me to rail loudly about the inefficiencies of d.p.
If one is going to rail against the inefficiencies of DP then one should at
least be aware of those areas where they are already -- at least in theory
-- more efficient than what BB proposes.  Namely at least DP is aware of the
importance of capturing and retaining italics and bold during scanning.  And
at least in theory they understand the importance of getting a first-quality
OCR in the first place -- as opposed to simply grabbing an OCR off of
archive.org

Part and parcel of "efficiency" is not having to find and fix things that
you have broken needlessly earlier in the processing chain.

Re: [gutvol-d] a review of some digitization tools -- 002

Jim Adcock