jim said:
> As one who has actually worked professionally
> on a number of different recognition system
> I respectfully disagree with your prediction of
> the future. No OCR system will ever be "error free."
i didn't say the o.c.r. would be error-free, jim...
i said their _text_ would eventually be error-free.
as i have proven here, exhaustively, there are _plenty_
of post-o.c.r. fixes that are programmatically applied,
first and foremost among these many measures being
comparison of separate digitizations of the same book.
(and google sometimes has half-a-dozen digitizations.)
if i had their corpus, i could create hundreds of fixes...
plus i expect google has invented a good number more
of these effective techniques than i could ever dream of.
> "Demos" often remain "demos" forever, because
> turning "demos" into "real world products
> accepted by real world users" is so danged hard.
well, yes, sometimes that charge has merit...
and sometimes it's nothing but a cheap shot.
which is it in this case? well, jim, you might
have wanted to check the subject-header here.
check out the rest of this reply, to benjamin...
***
benjamin said:
> And would your lordship stoop to provide
cut the cutesy crap, ok? :+)
i've done far more than enough grunt work here
to prove that i don't consider myself as anything
more than a grunt. no matter how loudly i yell...
the only people i am superior to here are the stupid,
who refuse to learn, who stick their head in the sand.
(unfortunately, there are quite a few of 'em here, but
the good news is most have put me in their kill-files,
so they don't know i told the truth about them again.)
> And would your lordship stoop to provide
> the location (as in URI) of these demos to
> a relative newcomer with the best of intentions?
here's the standard set of things that i point to:
> http://z-m-l.com/go/myant/myantp123.html
> http://z-m-l.com/go/mabie/mabiep123.html
> http://z-m-l.com/go/sgfhb/sgfhbp123.html
these are all page 123 from respective books,
nestled in the set of all pages from each book,
shown with text on one side, scan on the other.
here are the master files that generated the sets:
> http://z-m-l.com/go/myant/myant.zml
> http://z-m-l.com/go/mabie/mabie.zml
> http://z-m-l.com/go/sgfhb/sgfhb.zml
each of the pages in the set contains a form
at the bottom where errors can be reported...
go ahead and make a sample report on a page;
you'll find the report is appended to that page,
and also collected on a separate page with all
the reports that have been made for that book.
as i said, though, error-correction is quite easy.
the hard part -- or so people here seem to think --
is turning the text into a respectable e-book, but
i've got that base covered quite thoroughly as well,
as you see if you go to the main page at z-m-l.com
and follow the link for examples. text-to-html, and
the next step (.mobi and .epub) follows from there...
> I always thought that that project was for
> independent producers producing ebooks
> on their own for PG, as opposed to the
> general public proposing fixes to PG volunteers.
well, that whole sentence shows you're very confused,
but that's to be expected in "a relative newcomer", eh?
i have suggested all kinds of methodologies, ranging from
independent producers to collaborative methods (e.g., d.p.)
to encouraging and using feedback from the general public.
but slicing up the world that way isn't particularly productive.
the tools i've created can be used by independent producers,
or in collaborative workflows, or after-the-fact by the public.
they can exist offline or online, and the behavior is the same.
the tools don't know (or care) how the humans split the tasks.
if you want more demos, i've got a ton. i'm also willing to
program new ones, quite specific to exactly what you want,
providing you agree to host them online for people to use...
same offer goes for you, jim. note that i'm calling your bluff.
-bowerbird