Re: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

2 Jun 2006


      Page scans are not eBooks.  They are not universally searchable,
readable and editable, and have full ability to become proprietary,
i.e. owned or copy-protected.

Maitri

On 6/2/06, Michael Hart <hart@pglaf.org> wrote:
...
On Thu, 1 Jun 2006 Bowerbird@aol.com wrote:
...
michael said:
...
Perhaps the way to think about this is to consider
   just how many more or less readers we would get if
   the file sizes were that much larger or smaller.
there are something like 100,000 books available at google.
d.p. digitizes about 2,000 books a year.   they can't keep up.
We work with all possible sources to get eBooks.
...
...
In the end, I think we should provide both.
in the end, users will turn exclusively to "digital reprints"
-- digital text that mimics the scans so accurately that
there's really no good reason to consult the scans at all.
I seem to get plenty of messages from scholarly types who
think source scans will always be in high demands, at the
ivory tower level, at least.
...
after 10 or 20 years of nobody downloading the scans,
we'll be able to feel comfortable taking them offline...
after 10-20 years the actual hardware requirements will
appear so drastically reduced that the load will be nil.
...
...
Some operations deliberately do not put their high
  resolution scans online for downloading, rather an
  automated process reduces the resolution, so these
  scans are no longer suitable for OCRing.
yeah, that's sad.   but what are you gonna do about it?
Once you provide a better alternative, you force those
who should have done it originally to do it better too.
...
...
The odds of being able to create a complete eBook,
  using those scans that are usually made available,
  perhaps about 1/4 to 1/3, based on the reports you
  have probably already seen.
yeah, that's sad too.   but that's a quality-control issue
that i suspect the scanning operations will solve soon...
I was under the impression that much of this low-quality
was intentional, so I don't think those will be improving,
at least until someone provides a better mousetrap.
...
...
Once you go through the effort of scanning missing
  pages, rescanning the pages that did not work with
  your OCR programs, etc., it often might seem worth
  the effort simply to scan the entire book with the
  higher resolution scans that you can then post for
  others to use.
i don't think -- for most books -- that will be the case.
All depends on how much effort it is for the particular person
in question. . .if it's a lot of effort to get the materials,
but low effort to do the scanning, you may as well replace the
entire file with your better examples of what should be done.
...
but perhaps that's because i don't see much use for
high-resolution scans.   i am _not_ in love with scans.
like i said above, they will eventually be left behind.
1.  Makes for better OCR
2.  The scholarly types, as above.
...
the important point _today_, though, is that we have
a load of scan-sets, more than we can process now,
and it's silly to ignore them when we _could_ offer them
for people to _read_ now, even if they aren't digitized...
Yes, and we should.
...
...
Do raw scans qualify as eBooks?
does it matter?   they are what they are.   no more, no less.
and almost everyone sees them for exactly what they are.
It matters to the integrity of the eBook world.
...
...
This is the "quick and dirty approach" and doesn't
   cost much in terms of time, effort or money
um, scanning does indeed take time, effort, and money,
at least if you're doing it on a scale of millions of books...
_I_ have no intention of quitting until I can give away a million books,
and I have about the same intention of spending any real money on it.
It will be interesting to see who can put a million eBooks online first,
and how good they are.
...
...
I suppose the real question comes down to
   purposes for making eBooks.
i'm not sure of that.   we make e-books for people to read,
and so their text can be searched and easily repurposed...
This is obviously NOT the goal of many.
...
scans get us part of the way.   digital text gets us the rest...
Yep. . .scans are just one step, I say it's the easiest.
...
...
The various university projects still seem to be a
  great deal concerned with keep their eBooks out of
  the hands of the public, as has Google, though the
  Google philosophy may be in the process of change.
the michigan librarian pledged that all public-domain books
scanned from their library will be made available to the public.
i assume he meant the scan-sets.   but from them, we will soon
be able to automatically get digital text, so there's no difference.
I can only hope he meant something more worthwhile to the masses
than what most of the current scan-sets provide and that he will
be able to find some way to keep the ball rolling.
...
...
Right now it's hard to tell what Google has chosen
  as their goal; will they really try to do millions
  of books in the next 54 months after perhaps stats
  of .1 million in the first 18 months?
they most certainly will.
We'll see, and I am taking bets.
...
...
Will Google change their philosophy per downloading scans,
if we open up negotiations with them, _maybe_.   we can hope.
What is it that St. Augustine was quoted as saying?  A bit like:
"Work as though everything depends on you,
Pray as though everything depends on God."
I think we should work as though it all depends on us,
and hope that Google will get somewhere.
...
...
and or downloading their full text searching database?
they'll never make their text-database public, as that's the
competitive edge for which they are paying many millions...
They claim all those million are spent on scaning, not OCR.
...
do you really think they're gonna hand it over to microsoft?
Or to the world at large?
...
...
Until Google decides to actually proofread eBooks,
if you mean "ensure that their digital text is highly accurate"
-- which can be completely orthogonal to "proofreading" --
then you can be certain that they will "decide" to take that step.
inaccurate text gives bad search results; google won't tolerate that.
Actually, you have it backwards there. . .think about it. . . .
Google's monster speciality is SEARCH ENGINES!!!
They are MUCH more interested in writing a search engine that will
read fuzzy OCR text than in increasing the accuracy of the text.
...
...
My own goal has always been for the public to have their own
  home eLibraries, just as they have their own home computers.
that's the goal for a lot of us.
!!!
...
-bowerbird
Thanks!!!
Give the world eBooks in 2006!!!
Michael S. Hart
Founder
Project Gutenberg
Blog at http://hart.pglaf.org

Re: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries

maitri venkat-ramani