
On Sat, Sep 22, 2012 at 03:02:45PM -0700, don kretz wrote:
Jon,
The longer I think about the job we're trying to do here, and the collection of books we're assembling, the more I conclude that the most important piece of the foundation upon which it's built needs to be your Step #1 - collect the page scans. Once the text has been contributed, it's just about unassailable in its original form unless the images of the source text are preserved and are accessible.
For those who are not aware: the scans from all eBooks produced by Distributed Proofreaders are saved at pgdp.net or backup sites, but very few scan sets have been included with the eBooks. We have had a procedure/convention for including original scans with our eBooks since around 2004. Look for "page-images" subdirectories in our eBooks. I see around 7000, though not all are necessarily complete scan sets, and there is variation in image quality. But at least they match. Frequent producers like Al Haines (well, there's really nobody LIKE him!) save their scans, and could likely be convinced to share them. When we get errata reports, it is very frequently a first step to try to find scans from GoogleGooks or another source (Internet Archive, Gallica, etc.). -- Greg
Without the images, I don't see anything useful coming from further work (and I also think even the existing procedures for text refinement are problematic.)
So weeding your proposal (for which I will add myself to the line of prospects) back to its most limited validation of concept), can we identify and acquire page scans for the top 10 ebooks?
(Also, the definition of "top 10" to whatever exponent apparently requires some examination. Direct downloads from PG may not well reflect actual demand. For instance, the top 10 from Feedbooks presumably but not necessarily attributable or potentially attributable to PG includes:
1. The Art of War Sun Tzu
2. Alice's Adventures in Wonderland Lewis Carroll
3. The Adventures of Sherlock Holmes Arthur Conan Doyle
4. Price and Prejudice Jane Austen
5. The Curious Case of Benjamin Button (part of PG's ebook "Tales of the Jazz Age") F. Scott Fitzgerald
6. The Count of Monte Cristo Alexandre Dumas
7. Grimm's Fairy Tales Jacob Ludwig Karl Grimm & Wilhem Karl Grimm
8. The Picture of Dorian Gray Oscar Wilde
9. War and Peace Lev Nikolayevich Tolstoy
10. The Divine Comedy Dante Alighieri)
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d