Jon,
The longer I think about the job we're trying to
do here, and the collection of books we're
assembling, the more I conclude that the most
important piece of the foundation upon which
it's built needs to be your Step #1 - collect
the page scans. Once the text has been
contributed, it's just about unassailable in
its original form unless the images of the
source text are preserved and are accessible.
Without the images, I don't see anything
useful coming from further work (and I also
think even the existing procedures for text
refinement are problematic.)
So weeding your proposal (for which I will
add myself to the line of prospects) back to
its most limited validation of concept),
can we identify and acquire page scans
for the top 10 ebooks?
(Also, the definition of "top 10" to whatever
exponent apparently requires some
examination. Direct downloads from PG
may not well reflect actual demand.
For instance, the top 10 from Feedbooks
presumably but not necessarily attributable
or potentially attributable to PG includes:
1. The Art of War
Sun Tzu
2. Alice's Adventures in Wonderland
Lewis Carroll
3. The Adventures of Sherlock Holmes
Arthur Conan Doyle
4. Price and Prejudice
Jane Austen
5. The Curious Case of Benjamin Button
(part of PG's ebook "Tales of the Jazz Age")
F. Scott Fitzgerald
6. The Count of Monte Cristo
Alexandre Dumas
7. Grimm's Fairy Tales
Jacob Ludwig Karl Grimm & Wilhem Karl Grimm
8. The Picture of Dorian Gray
Oscar Wilde
9. War and Peace
Lev Nikolayevich Tolstoy
10. The Divine Comedy
Dante Alighieri)