New subject: Examples of the difficulty of auto-discernment of document structure/semantics (was Heebee Jeebees on Gutenberg)

14 May 2005

      jon noring said:
...
abbyy7 + your-sooper-dooper-tools = no-more-need-for-DP
well, let's do the equation right, ok?

opticbook3600 (most other scanners will start you off wrong)
+ careful scanning (note that this is _oh_ so very important)
+ image correction (deskewing and zoning regularization)
+ abby v7 (and using the old-book version whenever needed)
+ super-duper-tools, used wisely, for about 4 hours/book
= an error rate of 1 error every 10 pages, good enough for
+ continuous proofreading (with scans available for viewing)
+ freshly-informed-and-motivated end-users looking for errors
+ a comprehensive error-detection system
+ a comprehensive error-reporting system
+ a comprehensive error-correcting system
+ a comprehensive system designed to foster community
= a steady march toward absolute perfection in the e-texts

you can sprinkle in as much or as little d.p. as you want.
they are the _cooks_, and not an ingredient in the recipe.
and they aren't _required_ for a meal, but they sure help!
a ton of dedicated people with expertise and experience!
responsible for _half_ of p.g. by the time it hits #20,000!
without doubt, _the_ dynamic force in digitization today!
more uplifting than google, and million-books, and kahle!
(well, kahle is a large factor in their success, but still)!
...
but it is *equally* important to understand, and markup, 
  the *structures* associated with all portions of the content. 
  This is something that *has* to be done by human beings. 
  It cannot be done automagically, at least with anything near 
  acceptable accuracy (as shown by a few trivial examples
  I posted here a while back.)
blah blah blah.  your examples are bull.
you didn't send them here, you sent 'em
to ockerbloom's list, and he would not
allow my response to go through, since
it destroyed your positions so thoroughly.
(it was _complete_; he took it as _mean_.)
repost your examples if you have the courage.

given the normal degree of consideration for consistent formatting,
my apps can determine the structure of a text in a matter of seconds,
even on my "legacy" mac, less than the acrobat splash is up...     :+)

-bowerbird

Re: [gutvol-d] Heebee Jeebees on Gutenberg

Bowerbird＠aol.com

Jon Noring

Geoff Horton

Robert Shimmin

tags

participants (4)