Parser Debugging and Diagnostics
- Diagnostic switches
- Use setName(), setDebug(), and traceParseAction to monitor parsing behavior
- Use ParseException.explain() to get more details
- Use runTests() to run multiple test and see where parsers fail to parse
- Use
-
operator instead of+
in selected places in your parser to improve parse error locations
Pyparsing contains the following Diagnostics
switches:
Diagnostics.warn_multiple_tokens_in_named_alternation
Diagnostics.warn_ungrouped_named_tokens_in_collection
Diagnostics.warn_name_set_on_empty_Forward
-
Diagnostics.warn_on_multiple_string_args_to_oneof
*Diagnostics.enable_debug_on_named_expressions
All are disabled by default, but you can selectively enable them to get some warnings if your parser uses techniques that may not give you desired results.
To enable a switch, add code similar to the following to your parser code:
import pyparsing as pp
pp.enable_diag(pp.Diagnostics.warn_ungrouped_named_tokens_in_collection)
You can also enable all warnings by:
- calling
pp.enable_all_warnings()
- running Python with the
-Wd
or-Wd:::pyparsing
switch - running Python with the
PYPARSINGENABLEALLWARNINGS
environment variable set to a non-empty value
You can suppress all warnings by:
- running Python with the
-Wi:::pyparsing
switch
Enables warnings when a results name is defined on a MatchFirst or Or expression with one or more And subexpressions (only warns if __compat__.collect_all_And_tokens
is False; this flag is set to True in pyparsing 3.0.0, and the compatibility behavior is no longer
supported).
Here is an example of an ungrouped named tokens in collection:
term = ppc.identifier | ppc.number
# this expression has a results name, and the expressions it
# contains also have results names
eqn = (term("lhs") + '=' + term("rhs"))("eqn")
eqn.runTests("""\
a = 1000
""")
The resulting output is:
diag_examples.py:11: UserWarning: warn_ungrouped_named_tokens_in_collection: setting results name 'eqn' on And expression collides with 'rhs' on contained expression
eqn = (term("lhs") + '=' + term("rhs"))("eqn")
a = 1000
['a', '=', 1000]
- eqn: ['a', '=', 1000]
- lhs: 'a'
- rhs: 1000
Note that all the results names are at the same level, no hierarchy. If other expressions in this parser had 'lhs' or 'rhs' names, in similar ungrouped hierarchy, the 'lhs' and 'rhs' names would clash, and the default would be for only the last name to be reported.
The resolution for this warning is to Group eqn
:
eqn = Group(term("lhs") + '=' + term("rhs"))("eqn")
Which gives this output:
a = 1000
[['a', '=', 1000]]
- eqn: ['a', '=', 1000]
- lhs: 'a'
- rhs: 1000
Now 'lhs' and 'rhs' are grouped under 'eqn', and would not be overwritten by other 'lhs' or 'rhs' names in other expressions.
Enables warnings when a Forward is defined with a results name, but has no contents defined.
This is to help report when a Forward has been defined, and named with a results name, but never assigned any contents:
expr = Forward()("recursive_expression")
# never used afterward
Enables warnings whan one_of
is incorrectly called with multiple str arguments. A common mistake is to call one_of
with multiple str arguments:
direction = one_of("left", "right")
one_of
takes additional keyword arguments, so Python will accept this call, but it generates the wrong expression. The correct form is:
direction = one_of("left right")
or
direction = one_of(["left", "right"])
After enabling this switch, all expressions that are defined with names using setName()
are automatically enabled for parse-time debugging.
TBD
TBD
TBD
TBD