
Jon, The longer I think about the job we're trying to do here, and the collection of books we're assembling, the more I conclude that the most important piece of the foundation upon which it's built needs to be your Step #1 - collect the page scans. Once the text has been contributed, it's just about unassailable in its original form unless the images of the source text are preserved and are accessible. Without the images, I don't see anything useful coming from further work (and I also think even the existing procedures for text refinement are problematic.) So weeding your proposal (for which I will add myself to the line of prospects) back to its most limited validation of concept), can we identify and acquire page scans for the top 10 ebooks? (Also, the definition of "top 10" to whatever exponent apparently requires some examination. Direct downloads from PG may not well reflect actual demand. For instance, the top 10 from Feedbooks presumably but not necessarily attributable or potentially attributable to PG includes: 1. The Art of War Sun Tzu 2. Alice's Adventures in Wonderland Lewis Carroll 3. The Adventures of Sherlock Holmes Arthur Conan Doyle 4. Price and Prejudice Jane Austen 5. The Curious Case of Benjamin Button (part of PG's ebook "Tales of the Jazz Age") F. Scott Fitzgerald 6. The Count of Monte Cristo Alexandre Dumas 7. Grimm's Fairy Tales Jacob Ludwig Karl Grimm & Wilhem Karl Grimm 8. The Picture of Dorian Gray Oscar Wilde 9. War and Peace Lev Nikolayevich Tolstoy 10. The Divine Comedy Dante Alighieri)