Re: Internet Archive makes more than 1 milllion books available online to blind, dyslexic

aaron said:
LOL. So much for "_all_ these e-books are also available as text, and have always been available in that format."
aaron, i don't have experience with the daisy format. are you telling us that one can't get text out of that? and are you claiming archive.org doesn't _start_ with the o.c.r. output when it creates the daisy-format file? if they give you daisy, and not text, then that must be part of the d.r.m. they wrap around the file to prevent the books from being "pirated" out to the seeing world. what i said _is_ true about the public-domain material. perhaps none of the public-domain material is counted in their announcement of "more than 1 million books"...
I suspect that the quality of the public domain books will be roughly the same as the copyrighted texts.
so we agree on that.
As to whether or not they will spend the time and money to improve the quality of their books, I doubt it.
so we agree on that.
The cost would otherwise be prohibitive, unless the improvement could be made via software.
well, we cannot know that unless we categorize the errors. which is precisely why i am undergoing such categorization. but yes, i've done more than enough research along the way to say unequivocally that _many_ improvements can be made, via software, with -- at most -- minimal human input needed. which pinpoints some of the ridiculousness around this issue. on the one hand, there are apologists who try to tell us that "even bad o.c.r. is better than no o.c.r. at all." well, that's true. but if we're really willing to settle for that, then why do those people over at distributed proofreaders even bother working? obviously, some of us put a value on correcting bad o.c.r. on the other side of the tightrope, we have those same people at distributed proofreaders who try to tell us that their job of correcting o.c.r. is horrendously difficult and time-consuming. and this is equally ludicrous. it takes them a long time to do it because they're doing it in a way that takes a long time to do it. i'm trying to walk the middle-ground, which seems to me like it should be a very broad path that's obviously the best to take, where we spend a sensible amount of time on a book (an hour), and take it to a place where it meets almost all of our needs... we can decide to ignore quality and focus on quantity alone, and we can have millions of books; and all of 'em are flawed. we can decide to ignore quantity and focus on quality alone, and we can have 23,456 books, like project gutenberg does... we can even do _both_, and have 23,456 high-quality books and millions of flawed ones. so, is that what we want to do? or we can spend one hour per book, and have half-a-million pretty-darn-good books, with all their obvious flaws corrected. i don't know why it's so hard to see this middle path is the best. -bowerbird
participants (1)
-
Bowerbird@aol.com