
At 01:02 AM 10/21/2004 -0700, you wrote:
As you say, the information is in the source file, but currently inaccessible to you. One of the ways to solve this problem is to switch to a relatively standard master document format, such as TEI, combined with flexible tools that could convert the source to other editions such as HTML or text, while allowing us to choose how much of the preserved information, and to also choose how that information was encoded. You could then easily generate for yourself a 'with page numbers' text edition of the document you're interested in.
So, does this mean that I now not only have to download the master xml file, the css, and a set of conversion tools? You must be kidding, right? If it came to that, I would rather have the plain text and forget the page numbers. It is already inconvenient to use "lynx -dump -nolist filename.htm." Why in the world would I want to run it through a conversion tool and still have to do that anyway? OK, so a plain text file can be output directly from the xml. I still have to go through at least one extra conversion step that I wouldn't have to otherwise. I had a look at sgml just to see how hard it would be to get plain text. What a royal pain! I gave up when it kept complaining about some file missing when I was using their samples.