Re: [gutvol-d] Fwd: Re: epubeditor.sourceforge.net

27 Oct 2011

      On 27 October 2011 18:13,  <Bowerbird@aol.com> wrote:

[snip nod + uhuh]
...
...
   The OCR project I'm loosely affiliated with, Tesseract,
   is also a Google product. I was aware of that, and
   I was told a few things about it, but as I can't remember
   which I was asked not to repeat, I'd just prefer to
   stay on the safe side and not mention anything about it :)
i'll save you the trouble.  tesseract is a piece of crap.    :+)
I'll concede that it has had its problems... it's still the only OCR
software in existence that comes with all the tools required to
support an entirely new language, which is where my interest began.
(They might be available for Finereader, but the limb and/or offspring
price is somewhat out of my range).
...
but its lousy performance will mean that google has to
do a lot of very hard work to write the routines that can
correct the lousy output turned out by that piece of crap.
Not quite; they did quite a lot of work to bring the engine itself up
to modern standards, and have been relatively successful: apparently,
someone somewhere did a test, and found that tesseract outperforms
Finereader Mobile, so we've had Abbyy employees on our mailing list
trying to convince people otherwise.
...
which will turn out to be a _great_ thing in the long run.
but hey, i would _love_ an open-source o.c.r. program.
love love love...  so if you can turn it into a worthy app,
i'm sure a lot of people like me will heap praise on you.
I'm not an 'app' guy, I'm a language technology guy. The best I can do
at visual design is to discern when something is ugly, and my attempts
generally fall into that category. So I'll stick to what I'm good at.
...
ok, yes, i admit it, i haven't actually evaluated tesseract
in the last year or so, so maybe it improved immensely.
_maybe_.  but i really doubt it.  it needed a lot of work.
but hey, let me know if i'm wrong, ok?, and i'll review it.
I'd say hold off for a while. It got a huge shot in the arm when the
Android guys took notice (the handful of people who usually work on it
in the Google Books building were mostly occupied with adding new
language support), but since then Google Docs and Google Image Search
have integrated it, so there should be a definite improvement in the
next version (which, I think, is due to be released around
Thanksgiving? I'm not American, so I only have a vague idea of when
that is :)
...
...
   Now that I didn't know. Seems counterproductive.
if my enemy buys my friend, my friend is no longer my friend.
...
   The XML data is quite poor for that purpose, as
   it lacks word coordinates, so I chose not to mention it.
   (I think they made public the scripts they use to
   convert the data into a usable form for that purpose,
   but I'm not sure).
hmm...  i guess i'll have to take another look at that.
as soon as i work up the nerve; i'm allergic to x.m.l.
Makes you break out in angle bracket shaped hives, eh? There's a
script out there somewhere to convert it into the djvused format, I'll
see if I can find it.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

Re: [gutvol-d] Fwd: Re: epubeditor.sourceforge.net

Jimmy O'Regan