
Hi James, Still do not know who you are citing! Though, you are basically describing what any parser should do (common knowledge). You also have a few conceptual errors in your description. What you fail to mention is that practically all systems that take input and create some kind of input parse! Even a simple find is parsing. True enough, I would not necessarily call that a parsers. You just describe a syntactic parser. Parsers can be designed to also give information on semantic and pragmatic usage. (Take a look at XCode) Of course, the amount of useful information will depend on the design of the parser. Finally, and not least, any error message is only meaningful to those that know what they are doing and understand the system they are using. In other words a JADU will not make heads or tails of an error message, no matter how much information you offer. (see many MS error messages) Please do not mention troff and the like it gives me shivers and besides makes me feel old. I do not know(or knew) anybody that liked troff. Having said all that, what is the actual point you are trying to make and in what context. It is completely oblivious to me. Sorry. [BB, please no sly remarks ;-)) ] regards Keith. Am 08.12.2011 um 22:10 schrieb James Adcock:
Italics are bounded by <i>...</i> or <em>...<em>
That which transforms something with looks like plain-text with markups to something more complicated which a machine can further process in interesting ways is typically called a parser. Parsers have known problems depending on the design of the markup language which they are fed. Two of the most central problems (plus a third hidden problem to be mentioned later) are error detection and error recovery. Error detection meaning that it is generally recognized that if the writer makes a mistake on input then it is better to detect that error and report on it rather than continuing silently. Error recovery meaning even if the input has an error giving up immediately rather than trying to parse the rest of the input file and report on all the other errors is probably also a bad design. If the user makes 12 errors in their input file you would like to report all 12 errors, or at least most of them, so the user doesn't have to make 12 separate submissions to the parser to detect all input errors.
One simple input language design aspect, which is well-known, is to design your open-symbols to be distinct from your close-symbols. Then an open-symbol followed by an open-symbol is detectable as an error -- there should have been a close-symbol in there:
<i> .... <i>
Error: two italic <i> symbols with no closing </i> symbol.
Compare to say using underbar to mean both the open and close italic symbol:
_ ... _
Is this an error, or is it correct?
Once cannot diagnose the situation, it could be correct, or it could be an error.
The desire to be able to detect input errors thus being a reason why SGML-like markup languages have gained in preference over the last 40 years over K&R troff-like markup languages.
What's the third problem? The input language OUGHT to be designed such that something which is an error LOOKS like an error. When the parser points out an error to the user the user ought to say "Oh yes that is an error, I recognize that I made an error."
Consider instead an input language which relies on a lot of hidden unspecified meanings and implied design rules. Now one of two things happens on "input error"
1) What most likely happens is that the output is silently generated but looks ugly and fails to "match" the input submitted for reasons that remain unexplained and undiagnosed. The user is left confused, frustrated, and bewildered, and has to try making changes to the input "at random" to try to fix the problem -- or to give up and accept the erroneous output as a given.
2) Better would be if the parser issues errors even though superficially to the user the input "looks good." Then the user goes back, looks at their input and says: "What do you mean that's an error -- that input looks absolutely beautiful to me!" The end result is still the same: the user is left angry, frustrated, and confused, with no good prospects for making forward progress.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d