on the will to scan books and digitize their text

i said:
however, if the paper-book is in fairly good shape, and its text is rather straightforward, and a best-of-breed scanner is used, and the scans are done carefully, and then properly treated (e.g., deskewed and regularized), and o.c.r. is done with a best-of-breed program, then auto-clean-up tools combined with normal spell-check will produce quite accurate text, thank you very much... ...and... so it's 600,000 scanned versus 10,000 proofed... meaning that any people who want one of those 590,000 that have been scanned but not proofed will need to do the o.c.r. and proofing themselves, providing they can locate the scan-set online... sounds fair to me.
of course, whether or not people will deem it necessary or even desirable to _do_the_work_ to get digital text is another question entirely. branko ran a poll at the teleread site and in the forums at distributed proofreaders, and the results indicate that people have little interest in digitizing their home library, the books they have sitting as paper-copies in their homes. over _half_ say they'd digitize them only if it could be done with _less_than_one_hour_per_book_. over one-quarter say they'd do it only if it took _less_than_ten_minutes_per_book_. a not-insignificant number want it happen almost _magically_, having it take it _less_than_one_minute_per_book. (telepathy?) since teleread specializes in creating unrealistic expectations, it would be tempting to chalk these poll results up to that, but alas, some respondents are people who actually digitize books. (but yes, the teleread respondents are even more out of touch.) it's somewhat shocking to understand that even people from distributed proofreaders say this, some of whom have likely spent more than ten minutes proofing _a_couple_pages_, so they have to know that time-frame is completely unrealistic. so this isn't just massive ignorance about the time required. the results tell us that _if_ they have a paper copy of a book, people seem to feel little need for a digital copy of the text. i know that i often tend to think from a mindset that posits that digital text has many advantages over the printed page, but people seem not to consider those advantages important. at least not enough to merit a non-trivial amount of their time. it seems only natural to extend the results, that if people have the scan-set of a book, they'd have little need for digital text... *** meanwhile, an article by kevin kelley in the new york times:
http://www.nytimes.com/2006/05/12/us/12vote.html?ex=1305086400 &en=5b3554a76aad524a&ei=5090&partner=rssuserland&emc=rss informs us a company in china has scanned 1.3 million unique titles in chinese, which it estimates is about half of the books published in the chinese language since 1949.
that's right: _1.3_million_. already. and still going strong... while we americans can't even get to a paperless office, and publishers sue the daylights out of the one and only company in this country who is willing to scan our libraries, china is moving quickly to becoming a paperless country... so michael, in spite of the flack that people want to give you, it looks like you've been _undercounting_, by a wide margin... and maybe just maybe you're holding your "world e-book fair" on the wrong side of the globe... -bowerbird

Bowerbird@aol.com writes:
of course, whether or not people will deem it necessary or even desirable to _do_the_work_ to get digital text is another question entirely.
Well, I think this really depends on the answers to questions like "Do I have a electronic reader I would prefer to read over a paperback?", and "Am I likely to reread this book often enough that spending an additional couple hours converting it to something usable on my reader worth my time?" At least with producing texts for Project Gutenberg, even if you never read it again, presumably others will.

"Bruce" == Bruce Albrecht <bruce@zuhause.org> writes:
Bruce> Bowerbird@aol.com writes: >> of course, whether or not people will deem it necessary or even >> desirable to _do_the_work_ to get digital text is another >> question entirely. Bruce> Well, I think this really depends on the answers to Bruce> questions like "Do I have a electronic reader I would Bruce> prefer to read over a paperback?", and "Am I likely to Bruce> reread this book often enough that spending an additional Bruce> couple hours converting it to something usable on my reader Bruce> worth my time?" At least with producing texts for Project Bruce> Gutenberg, even if you never read it again, presumably Bruce> others will. As the question was posed, the answer also depends on how many books you owe. Having several thousand books, even at one minute per book it is a huge work, especially if you are not going to read many of them, and you are not allowed to share them because of copyright reasons (and even the copying might be illegal).
From what I understood, however, the meaning of the pool was to measure the difference of results between DP and telerad users.
Carlo
participants (3)
-
Bowerbird@aol.com
-
Bruce Albrecht
-
Carlo Traverso