Skip to content

How to change or add a new custom parser error message

rybern edited this page Apr 14, 2020 · 4 revisions

Background

Stanc3 uses Menhir's Incremental API to provide custom error messages for when the parser detects an error. Menhir and the Incremental API are documented in the reference manual. We write the custom error messages in src/frontend/parser.messages file, which follows Menhir's messages format (documented here). Those messages are automatically integrated into the parser by the build rules defined in src/frontend/dune.

Each rule in parser.messages indicates the error state it corresponds to by specifying any stack of tokens which result in that error state. There may be many such stacks of tokens which could correspond to the same error, and Menhir will point out if there are two rules defined for the same error.

Modifying - Typical process

  1. Find the existing error message for the parser error in question by either:
    • Searching the message file for the current error message test,
    • Compiling a stan program which throws the error message with the --debug-parse flag, then search for the error state number
    • Add a new rule to parser.messages corresponding to any stack of tokens which could trigger the error you're interested in, then let Menhir tell you the line number of the existing rule the new rule collides with
  2. Update the message of the existing rule

Example

Suppose I want to change the error message for when the program is missing a '{' after the 'model' keyword.

I write an example Stan program that has the error:

parameters {
  real x;
}
model
  x ~ normal(0, 1);
}

Now I compile this program with stanc3 using the --debug-parse option. The result:

...
Expected "{" after "model".
(Parse error state 685)

The parse error state I'm looking for is 685. I search the parser.messages file for "685" (perhaps with the command grep -n 685 src/frontend/parser.messages), and I find:

program: MODELBLOCK WHILE
##
## Ends in an error in state: 685.
##
## model_block -> MODELBLOCK . LBRACE list(vardecl_or_statement) RBRACE [ GENERATEDQUANTITIESBLOCK EOF ]
##
## The known suffix of the stack is as follows:
## MODELBLOCK
##

Expected "{" after "model".

I can now replace this message with my new version, and remove my redundant `MODELBLOCK REAL` rule.

Adding new parser states

Suppose I actually wanted to add a new message that would only show when a more specific error is made; maybe I want a special message for when `model` followed by a declaration. To do this I need to change the parse grammar in src/frontent/parser.mly to split off a new error state for the situation I want to catch. However, I need to make sure not to change the behavior of the parser, so I should guarantee that he new rule will never successfully be built by terminating it with a token that can never occur. For example, I could change `model = MODELBLOCK LBRACE …` to `model = MODELBLOCK LBRACE … | MODELBLOCK REAL UNREACHABLE`. This is enough to create a new parser state with a different error message.

After creating a new parser state, you should run dune build @update_messsages in the src/frontend directory to check the new entry and generate the state comment (like the parser error number).

Checking for completeness of parser.messages

Check completeness by running the following commands from the src/frontend folder:

menhir --list-errors parser.mly > parser_new.messages ;
menhir parser.mly --compare-errors parser_new.messages --compare-errors parser.messages 

This first generates a fresh files parser_new.messages which includes dummy error messages for all error states. Next, it checks that parser.messages has at least at much coverage as parser_new.messages (hence 100% coverage). If certain error states, are missed, it will complain about that. For example, if states 68 and 336 had been missed in parser.messages, it will say

Read 461 sample input sentences and 461 error messages.
Read 459 sample input sentences and 459 error messages.
File "parser_new.messages", line 6648, characters 0-58:
Error: this sentence leads to an error in state 68.
No sentence that leads to this state exists in "parser.messages".
File "parser_new.messages", line 13297, characters 0-63:
Error: this sentence leads to an error in state 336.
No sentence that leads to this state exists in "parser.messages".

The thing to do, is then to copy the entries for states 68 and 336 from parser_new.messages to parser.messages and to insert informative error messages, in this case, that is

program: TRANSFORMEDDATABLOCK LBRACE TRUNCATE TIMESASSIGN WHILE
##
## Ends in an error in state: 336.
##
## atomic_statement -> lhs TIMESASSIGN . lhs SEMICOLON [ WHILE VOID VECTOR UNITVECTOR TRUNCATE TARGET SIMPLEX SEMICOLON ROWVECTOR RETURN REJECT REALNUMERAL REAL RBRACE PRINT POSITIVEORDERED PLUS ORDERED MINUS MATRIX LPAREN LBRACK LBRACE INTNUMERAL INT INCREMENTLOGPROB IF IDENTIFIER GETLP FOR ELSE COVMATRIX CORRMATRIX CONTINUE CHOLESKYFACTORCOV CHOLESKYFACTORCORR BREAK BANG ]
## atomic_statement -> lhs TIMESASSIGN . non_lhs SEMICOLON [ WHILE VOID VECTOR UNITVECTOR TRUNCATE TARGET SIMPLEX SEMICOLON ROWVECTOR RETURN REJECT REALNUMERAL REAL RBRACE PRINT POSITIVEORDERED PLUS ORDERED MINUS MATRIX LPAREN LBRACK LBRACE INTNUMERAL INT INCREMENTLOGPROB IF IDENTIFIER GETLP FOR ELSE COVMATRIX CORRMATRIX CONTINUE CHOLESKYFACTORCOV CHOLESKYFACTORCORR BREAK BANG ]
##
## The known suffix of the stack is as follows:
## lhs TIMESASSIGN
##

<YOUR SYNTAX ERROR MESSAGE HERE>

and

program: TRANSFORMEDDATABLOCK LBRACE REALNUMERAL HAT WHILE
##
## Ends in an error in state: 68.
##
## non_lhs -> non_lhs HAT . lhs [ TRANSPOSE TIMES TILDE SEMICOLON RPAREN RBRACK RBRACE RABRACK QMARK PLUS OR NEQUALS MODULO MINUS LEQ LDIVIDE LBRACK LABRACK HAT GEQ EQUALS ELTTIMES ELTDIVIDE DIVIDE COMMA COLON BAR AND ]
## non_lhs -> non_lhs HAT . non_lhs [ TRANSPOSE TIMES TILDE SEMICOLON RPAREN RBRACK RBRACE RABRACK QMARK PLUS OR NEQUALS MODULO MINUS LEQ LDIVIDE LBRACK LABRACK HAT GEQ EQUALS ELTTIMES ELTDIVIDE DIVIDE COMMA COLON BAR AND ]
##
## The known suffix of the stack is as follows:
## non_lhs HAT
##

<YOUR SYNTAX ERROR MESSAGE HERE>

Finally, parser_new.messages can be deleted.

Updating messages when the grammar has changes significantly

The --update-errors command is useful if the grammar has changed dramatically. You use it by running

menhir parser.mly --update-errors parser.messages > parser_updated.messages

This will try to repurpose the error messages of parser.messages for the updated grammar which may have totally different state numbering. It will put the resulting messages file (which will presumably not be complete) in parser_updated.messages. To make it complete, you can again follow the steps above.