
On 6/25/06, Greg Newby <gbnewby@pglaf.org> wrote:
Because we don't have a lot of subject cataloging, one value of this is that it does a good job of identifing children's eBooks (they tend to be "easy").
If the problem is that we don't have a lot of subject cataloging, provide more subject cataloging. We could copy the LoC cataloging for most of the catalog without too much work. If we're going to a Wiki-type thing, lists of children's books, mysterys, sci-fi, etc. will be made, and will be superior to this.
This is also usable for people seeking to develop literacy or provide literacy instruction, by providing a way of reading something "harder" or "easier" as desired.
If the problem is literacy instruction, then we should work on a list of books for literacy, not rely on some tool that can't tell the difference between a 17th century children's book and a 20th century one, or how much dialect is used. Again, a Wiki-tool is perfect for this.
If you have feedback on the results, or my idea for adding these scores as an element of the catalog search results, please chime in!
I think that these are somewhat interesting, but they are far from the most interesting factoids. I've been drooling over Amazon's Statistically Improbable Phrases, personally. I surely wouldn't have them as promenant as on the search page; I don't think it's the most important thing that most people look at.
0.281 6 3 6 4 0 0 22 2 8 5 19 6 Mary Olivier: a Life (etext9366)
This is surely a mistake; the second sentence in the book is "When old Jenny shook it the wooden rings rattled on the pole and grey men with pointed heads and squat, bulging bodies came out of the folds on to the flat green ground. " The numbers are too hard to decipher in this form to really try and understand why. I also wonder about "profainwordsPerWords"? The profanity of words has little to do with the readability; they're just adjectives and nouns from that perspective.