Re: [gutvol-d] Automated readability scores for PG eBooks

26 Jun 2006

      On 6/25/06, Greg Newby <gbnewby@pglaf.org> wrote:
...
Because we don't have a lot of subject cataloging, one
value of this is that it does a good job of identifing
children's eBooks (they tend to be "easy").
If the problem is that we don't have a lot of subject cataloging,
provide more subject cataloging. We could copy the LoC cataloging for
most of the catalog without too much work. If we're going to a
Wiki-type thing, lists of children's books, mysterys, sci-fi, etc.
will be made, and will be superior to this.
...
This is also usable for people seeking to develop
literacy or provide literacy instruction, by providing
a way of reading something "harder" or "easier" as desired.
If the problem is literacy instruction, then we should work on a list
of books for literacy, not rely on some tool that can't tell the
difference between a 17th century children's book and a 20th century
one, or how much dialect is used. Again, a Wiki-tool is perfect for
this.
...
If you have feedback on the results, or my idea for
adding these scores as an element of the catalog search
results, please chime in!
I think that these are somewhat interesting, but they are far from the
most interesting factoids. I've been drooling over Amazon's
Statistically Improbable Phrases, personally. I surely wouldn't have
them as promenant as on the search page; I don't think it's the most
important thing that most people look at.
...
0.281  6  3  6  4  0  0 22  2  8  5 19  6 Mary Olivier: a Life
(etext9366)
This is surely a mistake; the second sentence in the book is "When old
Jenny shook it the wooden rings rattled on the pole and grey men with
pointed heads and squat, bulging bodies came out of the folds on to
the flat green ground. " The numbers are too hard to decipher in this
form to really try and understand why.

I also wonder about "profainwordsPerWords"? The profanity of words has
little to do with the readability; they're just adjectives and nouns
from that perspective.

Re: [gutvol-d] Automated readability scores for PG eBooks

David Starner