
michael said:
Perhaps the way to think about this is to consider just how many more or less readers we would get if the file sizes were that much larger or smaller.
there are something like 100,000 books available at google. d.p. digitizes about 2,000 books a year. they can't keep up.
In the end, I think we should provide both.
in the end, users will turn exclusively to "digital reprints" -- digital text that mimics the scans so accurately that there's really no good reason to consult the scans at all. after 10 or 20 years of nobody downloading the scans, we'll be able to feel comfortable taking them offline...
Some operations deliberately do not put their high resolution scans online for downloading, rather an automated process reduces the resolution, so these scans are no longer suitable for OCRing.
yeah, that's sad. but what are you gonna do about it?
The odds of being able to create a complete eBook, using those scans that are usually made available, perhaps about 1/4 to 1/3, based on the reports you have probably already seen.
yeah, that's sad too. but that's a quality-control issue that i suspect the scanning operations will solve soon...
Once you go through the effort of scanning missing pages, rescanning the pages that did not work with your OCR programs, etc., it often might seem worth the effort simply to scan the entire book with the higher resolution scans that you can then post for others to use.
i don't think -- for most books -- that will be the case. but perhaps that's because i don't see much use for high-resolution scans. i am _not_ in love with scans. like i said above, they will eventually be left behind. the important point _today_, though, is that we have a shitload of scan-sets, more than we can process now, and it's silly to ignore them when we _could_ offer them for people to _read_ now, even if they aren't digitized...
Do raw scans qualify as eBooks?
does it matter? they are what they are. no more, no less. and almost everyone sees them for exactly what they are.
This is the "quick and dirty approach" and doesn't cost much in terms of time, effort or money
um, scanning does indeed take time, effort, and money, at least if you're doing it on a scale of millions of books...
I suppose the real question comes down to purposes for making eBooks.
i'm not sure of that. we make e-books for people to read, and so their text can be searched and easily repurposed... scans get us part of the way. digital text gets us the rest...
The various university projects still seem to be a great deal concerned with keep their eBooks out of the hands of the public, as has Google, though the Google philosophy may be in the process of change.
the michigan librarian pledged that all public-domain books scanned from their library will be made available to the public. i assume he meant the scan-sets. but from them, we will soon be able to automatically get digital text, so there's no difference.
Right now it's hard to tell what Google has chosen as their goal; will they really try to do millions of books in the next 54 months after perhaps stats of .1 million in the first 18 months?
they most certainly will.
Will Google change their philosophy per downloading scans,
if we open up negotiations with them, _maybe_. we can hope.
and or downloading their full text searching database?
they'll never make their text-database public, as that's the competitive edge for which they are paying many millions... do you really think they're gonna hand it over to microsoft?
Until Google decides to actually proofread eBooks,
if you mean "ensure that their digital text is highly accurate" -- which can be completely orthogonal to "proofreading" -- then you can be certain that they will "decide" to take that step. inaccurate text gives bad search results; google won't tolerate that.
My own goal has always been for the public to have their own home eLibraries, just as they have their own home computers.
that's the goal for a lot of us. -bowerbird