a concrete example might help...

here's the table of contents from
"free culture" by lawrence lessig,
in zen markup language format,
generated automatically from a
simple straightforward analysis
in about one-half of a second...

even though there are 3 levels
of headers, they are very clear,
indicated by varying indentation
(which represents, at the headers
themselves, a varying number of
preceding blank lines, of course.)

text-structures even more complex
than the one shown in this outline
can be communicated easily by the
number of preceding blank lines --
_if_ the rule is followed _consistently_
-- and grokked by routines consisting
of just a few lines of dirt-simple code...

by the way, just to say something "obvious"
that lee probably had not considered before,
one of the many ways my routines determine
the headers in a digitized text is to look for a
"table of contents" section -- usually toward
the start of the file, and usually marked with
"contents" or "table of contents" as a header --
and then examine that section quite carefully.
ends up it does a very good job of telling you
what specific phrases "might be" header-lines.

and if you're cleaning up the o.c.r. of a p-book,
for instance, there are usually _page-numbers_
there too, telling what _page_ each header is on.
pretty handy, eh?  indeed, in the .pdf of this book,
which you can download at http://www.lessig.org,
you will see that the page-numbers _are_ there, and
chapter 11, chimera, for instance, starts on page 177.

like i said, if you know what a header is likely to be,
and on what page it is located, it's fairly easy to find.
indeed, people have been using the "table of contents"
for precisely that reason for several hundred years now.

this is just one of the reasons why it ain't that hard
to write routines to ascertain the headers in a book.

like i said, it sounds very obvious when you hear it.
but have you ever heard anyone say it here before?

-bowerbird

---------------------------------------------


TABLE OF CONTENTS


     Free Culture
     Table of Contents
     License
     Publisher Page
     Library of Congress Cataloging
     Dedication
     Preface
     Introduction

     'Piracy'
          Chapter 1: Creators
          Chapter 2: "Mere Copyists"
          Chapter 3: Catalogs
          Chapter 4: "Pirates"
               Film
               Recorded Music
               Radio
               Cable TV
          Chapter 5: "Piracy"
               Piracy I
               Piracy II

     'Property'
          Chapter 6: Founders
          Chapter 7: Recorders
          Chapter 8: Transformers
          Chapter 9: Collectors
          Chapter 10: "Property"
               Why Hollywood Is Right
               Beginnings
               Law: Duration
               Law: Scope
               Law and Architecture: Reach
               Architecture and Law: Force
               Market: Concentration
               Together

     Puzzles
          Chapter 11: Chimera
          Chapter 12: Harms
               Constraining Creators
               Constraining Innovators
               Corrupting Citizens

     Balances
          Chapter 13: Eldred I
          Chapter 14: Eldred II

     Conclusion

     Afterword
          Us, Now
               Rebuilding Freedoms Previously Presumed: Examples
               Rebuilding Free Culture: One Idea
          Them, Soon
               More Formalities
               Shorter Terms
               Free Use Vs. Fair Use
               Liberate the Music -- Again
               Fire Lots of Lawyers

     Footnotes
     Hyperlinks
     Acknowledgments
     Index
     About the Author
     Jacket
     Typos Corrected
     Permissions
     The Dead-Tree Hardback Version of this Work
     zero markup language -- z.m.l. -- the future of electronic-books

---------------------------------------------

p.s.  extra points for everyone who realized that
-- since the lines in the table of contents section
are not to be rewrapped -- that is the reason that
all are prefaced with at least one leading space...