Re: [gutvol-d] Fwd: Re: epubeditor.sourceforge.net

26 Oct 2011


      On 24 October 2011 23:53,  <Bowerbird@aol.com> wrote:
...
...
   http://www.archive.org/stream/artofbook00holm#page/n12/mode/1up
i think we can agree that that's a whole lot of mud
that we need to scrape off a page that has 2 words.
Sure - if you're only interested in the text content, it's quite
useless. It is useful for OCR research to have that data, so I'm glad
they provide it - not as useful as corrected text, granted, but I
think the clearest example of the value of such data is reCAPTCHA,
which (in part of its operation) compares the output of two OCR
systems, and extracts images from the coordinates where they disagree.


-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

Re: [gutvol-d] Fwd: Re: epubeditor.sourceforge.net

Jimmy O'Regan