
Bowerbird wrote:
It's been one month since my post about "detecting headers", in response to jon noring's "challenge" in that specific regard.
in case you've forgotten, here's a quick recap:
big and bold. that's what headers look like. conspicuous. real hard to miss. easy to find.
as i said last month, i have developed a 30-item checklist. that's how many ways a header can make itself conspicuous. but the main way -- by far -- is simply to be big and/or bold.
so it's time now for part 2.
but first, any questions? don't be shy, step right up, because headers are the first step toward detecting all types of things. (which is why we need to discuss them in some more detail.)
There's enough variation in how headers can be formatted in print, as well as some other structures which look like headers but are not, that it is not possible to auto-determine with 100% reliability that something is a header. There are also language/country/time-era differences as well which further confuse matters. And even if one is able to correctly auto-determine that something is a header, there are sometimes difficulties in autodetecting the header level, which is usually important. It is simply not yet possible to reliably auto-determine the structure of books and documents. This is the big problem with PDF-to-whatever converters, since (unstructured) PDF does not preserve structural information -- it simply lays out the content according to visual typesetting conventions (which, of course, vary by country, language, time era, and the whims of the author/publisher.) Now, if the goal is to try to auto-determine a document's structure knowing that it won't always get it right, as part of a human proofing process (e.g., Distributed Proofreaders), then that is another matter. But it is hard to read from Bowerbird's comments as to whether he intends his methodology and tools to be part of a human proofing process, or to replace it entirely. I think he will find more acceptance of his methodology and tools by making clear the former. Jon Noring