Re: precisely why i called the bluff

jim adcock said:
Not exactly sure what would be pulled by who where.
well, no, jim, you're "not exactly sure", because you do not have the type of vision or clout michael has, unfortunately. which is not a slam, since few people do. certainly none of the "business as usual" people who are running the show now. they're all just a bunch of sheep. their redeeming quality is that most have good intentions. they're not "bad" people; they just like their familiar ways. it's not even clear that michael still has his old chops, but i'm _certainly_ not going to count him out just quite yet... especially as i dearly hope he'll shake their complacency.
But Google is making progress towards "reasonable" 100% human-free OCR translation of the body text of books, not including "the hard parts" such as TOC, Index, etc. and then only on relatively recent pubs such as circa 1900s.
google will eventually have error-free text throughout. which is what i've said all along. it is fairly easy to see. and the 50,000-item p.g. library will merely be "quaint," a testament to an outdated workflow that wouldn't scale.
One could imagine a system that accepted such "good enough" OCRs and put them online immediately, and which allowed real-world readers to propose fixes as they read the book. You could still require PG volunteers to review the proposed fixes before accepting them.
why would you "imagine" such a system, when you could find it described perfectly, with proof-of-concept demos, in the posts on this very listserve going back many years? i mean, seriously. a lack of imagination is _one_ thing... but a willful ignorance of the past is quite another thing. yet even this scenario is mired in the dated viewpoint that correction of o.c.r. is the main problem that needs solving. d.p. has tons of almost-fully-corrected books languishing... -bowerbird

And would your lordship stoop to provide the location (as in URI) of these demos to a relative newcomer with the best of intentions? I always thought that that project was for independent producers producing ebooks on their own for PG, as opposed to the general public proposing fixes to PG volunteers. -- b On Aug 31, 2011, at 5:37 PM, Bowerbird@aol.com wrote:
jim adcock said:
Not exactly sure what would be pulled by who where.
well, no, jim, you're "not exactly sure", because you do not have the type of vision or clout michael has, unfortunately.
which is not a slam, since few people do.
certainly none of the "business as usual" people who are running the show now. they're all just a bunch of sheep. their redeeming quality is that most have good intentions. they're not "bad" people; they just like their familiar ways.
it's not even clear that michael still has his old chops, but i'm _certainly_ not going to count him out just quite yet... especially as i dearly hope he'll shake their complacency.
But Google is making progress towards "reasonable" 100% human-free OCR translation of the body text of books, not including "the hard parts" such as TOC, Index, etc. and then only on relatively recent pubs such as circa 1900s.
google will eventually have error-free text throughout. which is what i've said all along. it is fairly easy to see. and the 50,000-item p.g. library will merely be "quaint," a testament to an outdated workflow that wouldn't scale.
One could imagine a system that accepted such "good enough" OCRs and put them online immediately, and which allowed real-world readers to propose fixes as they read the book. You could still require PG volunteers to review the proposed fixes before accepting them.
why would you "imagine" such a system, when you could find it described perfectly, with proof-of-concept demos, in the posts on this very listserve going back many years?
i mean, seriously. a lack of imagination is _one_ thing... but a willful ignorance of the past is quite another thing.
yet even this scenario is mired in the dated viewpoint that correction of o.c.r. is the main problem that needs solving. d.p. has tons of almost-fully-corrected books languishing...
-bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

google will eventually have error-free text throughout. which is what i've said all along. it is fairly easy to see. and the 50,000-item p.g. library will merely be "quaint," a testament to an outdated workflow that wouldn't scale.
As one who has actually worked professionally on a number of different recognition system I respectfully disagree with your prediction of the future. No OCR system will ever be "error free." why would you "imagine" such a system, when you could find it described perfectly, with proof-of-concept demos, in the posts on this very listserve going back many years? proof-of-concept demos are a form of "imagining" rather than a reality, as anyone who has actually implemented such demos well knows. "Demos" often remain "demos" forever, because turning "demos" into "real world products accepted by real world users" is so danged hard. It would be curious to see which actually get more reads, PG's "quaint" collection of 50,000 items as distributed by PG and 100's of other sites, or Google's "Millions" of OCR texts. I read both - but personally I read PG more. And I am an agnostic omnivore not a "PG-centric" when it comes to reading - I even read some of the crap Murdoch publishes.
participants (3)
-
Benjamin Klein
-
Bowerbird@aol.com
-
James Adcock