Re: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]

24 Dec 2004


      Michael Hart wrote:
...
Project Gutenberg has already produced and distributed nearly 15,000 
eBooks,
with a budget that has yet to reach a significant total for all 33+ years,
and is projected to reach a million eBooks without undue expense or effort.
PG produces books at a lower cost only if you neglect the cost of 
volunteer work. I'm sure a big organized corporation like Google can 
create eBooks way cheaper than a loosely organized group of volunteers 
like PG.
...
We'll just have to wait and see if either Google Print, or any of the 
various
"Million eBook Projects" will ever come up with even 1% of a million eBooks
that you can carry with you on a one inch stack of plain homemade DVDs.
Whereas PG already has reached 1.5% of a million books with 98.5% still 
to go.
...
If it hasn't been proofread, and if you can't take it with you, it is only
of limited value. . .sort of like reading over someone's shoulder.
Depends on what you want to do with the book. If you only want to cite 
some work a page scan (that you cannot take with you but is error-free) 
is much better than a proofread eBook (which may contain OCR errors).
...
With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save 
them
in your own favorite formats, fonts, margination, pagination, or whatever,
and you can search, quote, print, and do all the normal eBook fuctions.
Yours forever ... until new copyright laws separate you.
...
I would say that an eBook has to be at least 99.9% accurate, and that it
should then be a process as people read the eBooks, to send in corrections.
That is ~ 2 errors per page if you assume a line length of 55 and page 
length of 40 (~ 2000) chars.
...
Most of the Project Gutenberg and Distributed Proofeaders would say it has
to be over 99.99% and perhaps even over 99.999%.
That is approx. one error every 5 pages or every 50 pages. Still not 
very good.
...
Not only that, but, viewing the entire eBook effort as a 50 year process,
of which I have walked 33+ years, I must state for the record that I think
OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade
from now that doing the proofreading on the more obscure works will require
so much less effort than it does today, that it will be a great trade-off.
Which poses the question: isn't Google's approach to just scan the books 
today and wait, better suited to achieve the 1 million target? Every 
progress in OCR technology automatically "proof-reads" all books Google 
has scanned.


-- 
Marcello Perathoner
webmaster@gutenberg.org