Hi everybody,
   I am doing some text analysis on a large subset of Project Gutenberg
   etexts in us-ascii plain text format. I need to extract only the actual
   etext body from each etext file; in other words, I need to be able to
   cut off any legal fine print, notices to potential volunteers and
   information about donations. I looked randomly at several files and it
   looks like *** START OF THE PROJECT GUTENBERG EBOOK*** and END OF
   PROJECT GUTENBERG EBOOK delimit the parts I need. My question is: can I
   count on these markers appearing in each and every text? Or are there
   other delimiters and/or tags?
   Thanks to anyone who can help.
   Regards,
   Anna