re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

michael said:
I seem to recall an earlier report from someone who did lots of searches for Google books and determined that 88% of them were published after 1922.
i've posted this before, taken from lorcan dempsey's weblog, summarizing an article in d-lib. i always find it again easily by searching his site for "anatomy".
http://orweblog.oclc.org/archives/000800.html The anatomy of an aggregate collection September 17, 2005 Approximately half of the print books in the combined Google 5 collection were published after 1974. Almost three-quarters were published after the Second World War. Using the year 1923 as a rough break-off point between materials that are out of copyright and materials that are in copyright [16], more than 80 percent of the materials in the Google 5 collections are still in copyright (this is of course an upper bound).
if google has scanned roughly 100,000 pre-1923 items, and they were taking books off the shelves randomly, then we could assume they scanned 400,000 post-1923. but if we assume they were doing the pre-1923 items first, 100,000 pre-1923 scanned means 100,000 total scanned. seems to me assuming things does us absolutely no good. but google is _going_ to scan 10+ million books, eventually, so i'm not sure what difference it makes _how_many_ they've done "so far". are we really questioning their _resolve_ here? seems to me that they've proven they are dedicated to this... so attempts to figure out "how many books so far?" are silly. especially since we know that many of the post-1923 items did not have their copyrights renewed -- except that we do _not_ know what percentage, and thus cannot even _assume_ the answer to that important question, not with any certainty. if we say that half of the post-1923 books were not renewed, then that means that 60% (20% and 40%) are not in copyright. if we say that 1/3 of the post-1923 books were not renewed, then that means that 53% (20% and 33%) are not in copyright. if we say that 2/3 of the post-1923 books were not renewed, then that means that 86% (20% and 66%) are not in copyright. not that the answer would matter any, because due to the litigious arena into which we have allowed the project to be thrown, there's probably no way google would be likely to take the risk of showing _any_ of the orphaned material. so we're back to the original 20% that is pre-1923 and clear. of course, the answer to this is to give google an immunity, to let them serve as the "test-bed" that will act to bring out any claims of copyrighted material that might be lurking... in other words, let google show each book, in full, _until_ some _proof_ of copyright is rendered by another party. (and i do mean proof, and not just some bullshit claim...) -bowerbird
participants (1)
-
Bowerbird@aol.com