juliet said:
>   Virtually none of the other
>   image archives provide corrected text.
>   It is simply cost-prohibitive to do so.
>   What DP does as a volunteer effort
>   would be extremely costly to replicate.
>   For this reason alone, I believe that
>   Google will be using raw OCR behind their scans.
>   On new material, raw OCR from a good program
>   can be very close to 100% correct.
>   It is the older material that causes problems.

provided careful scans at the right resolution,
from the right scanner, the right o.c.r. program
combined with the right post-o.c.r. software can
yield us accuracy even on "older material"  that
approaches error-free results.  to say that it is
"extremely costly" to get this is simply not true.
it might have been very true three years ago or so.
might've even been true last year.  it is untrue now.

to see how, and to issue real-world challenges,
visit my blog regularly over the upcoming weeks.
e-mail me for the address if you are interested...

-bowerbird