
it was a mere 3-6 months ago that i was informing people here that their scraping of the google books would cause google to become too conservative in displaying scans. it's annoying when people tune out warnings. but it is even _more_ annoying when they act _surprised_ when the consequences show up!
um, yes, bruce, google is being overly cautious. your scan-scraper script is one main reason why. and your catalog is _another_ main reason why... of not quite the magnitude of publisher suits, granted, but big enough to be "main" reasons. so take a look in the mirror, buddy. and we cannot ignore the fact that _scrapers_ are the ones currently spooking publisher fear. their "hackers-will-just-grab the-whole-book" nightmare is tinged in reality when they see you. so you can act all surprised if you want, upon discovering that your actions have consequences. but all the people who have been reading this list know that _i_told_you_so_. and you didn't listen... -bowerbird

On 3/24/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
it was a mere 3-6 months ago that i was informing people here that their scraping of the google books would cause google to become too conservative in displaying scans.
it's annoying when people tune out warnings.
but it is even _more_ annoying when they act _surprised_ when the consequences show up!
I don't believe anyone has been mass-downloading books from Google's archive based on Bruce's index of their content, using Bruce's scraper or otherwise. Around a dozen DPers have claimed books to download, according to the list available at http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html (note that the list isn't currently being maintained, because there's little demand at DP for new content scraped from any site at the moment, and several of us are in the initial stages of working on a more general database-driven system for claiming books from image providers) The number of claimed books is in the low hundreds, and most of these have not been downloaded, either because research has indicated that the books are already in PG, or because there was no need for them on DP until now, due to the current glut of content working its way through the DP system. I'd be very surprised if DPers have been responsible for scraping even a hundred complete texts from Google's archive -- a tiny amount compared to the more than 35000 texts listed in Bruce's current index. As far I can tell, Google is allowing me to view all the works it has allowed me to view ever since their site was set up, so I don't see any evidence that they have become more conservative, at least in content displayed to people in the UK. On the other hand, their policy of restricting access based on the publication date being earlier than 1864 *does* exclude a lot of books which are public domain in the UK from being viewed in the UK -- and, oddly, they aren't moving the barrier forward each year, as they should (unlike the US, the public domain isn't frozen here, so new material is entering every year). It is just another example of US-based companies only dealing with non-US issues as a poorly considered afterthought, so it's not all that surprising :). -- Jon Ingram

Bowerbird@aol.com writes:
but it is even _more_ annoying when they act _surprised_ when the consequences show up!
um, yes, bruce, google is being overly cautious. your scan-scraper script is one main reason why. and your catalog is _another_ main reason why...
I don't think I ever expressed surprise. Annoyance, perhaps. I don't believe that Google's decision to only classify books as PD only when there's an explicit copyright, as opposed to including books with an explicit publishing date has anything to do with my (or other people's) scan scraper, or my catalog. Furthermore, if they were so concerned about them, do you think they would have put about another 25,000 books online in the PD status, and added a select box to search for PD-only books? The PD-only search seems to miss some things, but I'm quibbling. BTW, Google is aware of my catalog, and the Google Books program manager mentioned my catalog as "Bruce Albrecht's catalog" (or something close to it) at a conference. They've never attempted to contact me. Interpret that as you will.
participants (3)
-
Bowerbird@aol.com
-
Bruce Albrecht
-
Jon Ingram