
On Tue, 28 Dec 2004 03:49:41 -0800 (PST), Michael Hart <hart@pglaf.org> wrote:
How many of you have tried Google Print?
Have you noticed that the intitial offering of eBooks strongly resembles the Project Gutenberg catalogue???
This and some responses made me think that some people are thinking along the lines that they are using our texts in some way, so I checked it out. I figure that the answer is no, to both the explicit and implied questions. I started by searching for quotes from 20 etexts chosen at random from etext99, as follows: book "cardinals, abbots, councillors, legates, bishops, princes" book "indeed we be no fatted bullocks, we two" book "Est-ce que je ne connais pas mon filleul?" book "Suchet's head-quarters at that time was the old palace of the" book "She always has this man of letters of hers on her" book "Afterwards," he answered quickly. "A cursed gutta serena." book "himself with the people, he partially recognizes the truth of his words." book "Epistles are spurious, as that the Republic, the Timaeus, and the Laws" book "You may recall that our mutual and dear friend, old Allan Quatermain," book "Where rose the husbandman's abode," book "the felicity of his fellow beings, and sit down darkling" book "by a tub, artesian cold, and a loud and joyous singing of" book "As desires of waking hours are answered in sleep," book "Even while speaking at random, perhaps the better to hide" book "Calm and proud, Tartarin of Tarascon marched on in the night" book "Another fallacy is produced which turns on the absoluteness of" book "The evidence for the steadily growing danger of secession" book "Morose-minded people may complain of this; for myself I regard it" book "THAT old bell, presage of a train, had just" All of them returned normal search results, including a few from PG, but only the second (Jungle Book 2) offered a Google Print link. (Incidentally, for those who want to try, I find that preceding your search term with "book" will often produce a Google Print link when the bare search term doesn't.) A search for "book Tarzan" yielded, in Print results: Tarzan of the Apes - by Edgar Rice Burroughs - 320 pages Human Computer Interaction - edited by Julie Jacko, Constantine Stephanidis - 1348 pages C Primer Plus - by Stephen Arata, Stephen Prata, Kathleen Prata - 970 pages Not what I'd consider a typical PG search result! :-) "book barsoom" and "book mars" did even less well. No sign of the ERB series. Erewhon, Alice, Little Women, Oliver Twist, Tom Sawyer, Huck Finn, Zenda, Decline and Fall, at least some Sherlock Holmes, Last of the Mohicans, several from Plato and at least most of Shakespeare, are present. Richard Feveral is there, but Shagpat is nowhere. Tom Swift is AWOL. Tartarin of Tarascon can't be found. John Carter is once again mysteriously missing. Kai Lung has effaced himself into invisibility. And in the process of searching for these, I turned up about twice as many modern as pre-23 book titles. The page images I looked at are all from modern reprints, with "Copyrighted Material" tags on their sides. I imagine that the publishers would insist on this, which makes much sense of Google wanting to work with a collection of PD books from libraries. This pattern is, I think, consistent with what book publishers might be willing to provide. Any list of books drawn up by English speakers is going to have the most popular classics on it. An awful lot of the search results I found were from Penguin Classics, so it may well be that they simply have the whole Penguin Classics range. If so, a significant overlap with PG is inevitable. And the Google Print entries seem to have a lot more modern books than classics. Hmmm. Interesting. The only Tarzan link for Google Print is "Tarzan of the Apes", and the only Tarzan search result at the Penguin Classics site is, guess what? "Tarzan of the Apes". And Penguin Classics does not publish the Barsoom series. "Coincidence? I think not!" Interesting: both the search book "she could have seen through a pair of stove-lids just as well." and book "A robber is more high-toned" find Tom Sawyer in Google Print, and book "Christmas won't be Christmas without any presents," finds Little Women, but book "Papa was a pickle bottle" doesn't, and book Little Women pickle does find the book, but with the word pickle much further down in the book. Hmmm, I see. The text in the Google Print image reads "pa was a pickle-bottle" instead. So much for any thought of them using our text. The larger reason that they can't be using our text is that their search results point to page images, with the search term highlighted in yellow. You really couldn't do that unless you had mapped your text to the dimensions and placing of the image: it would be vastly easier to do it programmatically from the OCR process than to use an outside text.
We'd love to hear your experiences with Google Print.
It will be handy, though probably not as handy as Amazon, for confirming unclear corrections in some older texts. They've somewhat protected their page images from downloading by the casual browser, but it's easy to bypass that. The more significant restriction is the number of pages any one session is allowed to download. This seems, to me, a reasonable compromise for genuinely-copyrighted books, though an annoyance on these reprints where the main story is in the PD and only the bookends are in copyright. It'll be interesting to see what they do with 100% pre-23 guaranteed content. jim