david said:
>  
We could start with the results of stripping the header

and the "footer", where most of the legalese is these days.

does anyone here know the best way to strip both of them?


>  
Also, the ten or twelve most common words in the book after
>   stripping the ten or twelve most common words in the English language.

you'd need to strip more than a dozen.  below is a list from wikipedia.
there's a strong power-law in word usage.  unless you strip 200-500
common words, it probably won't reveal anything very interesting...

-bowerbird

>   http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Here are the top 100 words (from Project Gutenberg texts) in alphabetical order:

a
about
after
all
an
and
any
are
as
at
be
been
before
but
by
can
could
did
do
down
first
for
from
good
great
had
has
have
he
her
him
his
I
if
in
into
is
it
its
know
like
little
made
man
may
me
men
more
mr
much
must
my
no
not
now
of
on
one
only
or
other
our
out
over
said
see
she
should
so
some
such
than
that
the
their
them
then
there
these
they
this
time
to
two
up
upon
us
very
was
we
were
what
when
which
who
will
with
would
you
your