Skip to content

Latest commit

 

History

History
109 lines (91 loc) · 2.96 KB

arithmetic.md

File metadata and controls

109 lines (91 loc) · 2.96 KB

Simple Arithmetic Parsing Task

This task evaluates the value of simple arithmetic expressions, such as the following:

1
-1
+1
--1
-(1)
-(-1)
-1 + 2
1 + - 2 * 3
-(-1 + --2) * -3
(((1 + 2) * 3) / (4 * (5 - 6)))

It is adapted from pegen's x.gram, which is useful for testing pathological backtracking. I have removed the need to parse names (e.g., a + b) to aid in computing the actual numerical results of the expressions for validation.

(Note: these modifications seem to have made it less pathological regarding backtracking; the task may therefore change to try and recreate the difficulty)

Here is a rough description of the task:

  • there is one expression per line; except for the newline, whitespace is not significant outside of tokens
  • an expression is a sequence of operations involving addition (+), subtraction (-), multiplication (*), and division (/) operators and integer atoms
  • parentheses (( and )) may group sub-expressions
  • integers and parethesized expressions may be modified with a negative or positive sign (- or +)
  • the order of operations puts parenthesized expressions and integers at the highest priority, followed by signs, then by multiplication and division, then by addition and subtraction
  • for sequential operations at the same priority (e.g., + and -), the application is left-associative (e.g., 1 - 2 + 3 equals 2, not -4)

And here is a PEG (excluding ignored whitespace):

Start   <- Expr EOL? EOF
Expr    <- Term '+' Expr
         / Term '-' Expr
         / Term
Term    <- Factor '*' Term
         / Factor '/' Term
         / Factor
Factor  <- Sign Factor
         / '(' Expr ')'
		 / INTEGER
Sign    <- [-+]
INTEGER <- '0' / [1-9] [0-9]*
EOL     <- '\r\n' / [\n\r]
EOF     <- !.

Whitespace is allowed around any grammar terms except for within INTEGER, EOL, and EOF. Allowed whitespace characters are the set: \t\n\f\v\r.

Note that the grammar defines the syntax, not the semantics, as otherwise it would be right-associative. Parsers with left-recursion support could alter the Expr and Term rules like this to be more semantically accurate:

Expr    <- Expr '+' Term
         / Expr '-' Term
         / Term
Term    <- Term '*' Factor
         / Term '/' Factor
         / Factor

Otherwise you could convert them into repetitions:

Expr    <- Term ([-+] Term)*
Term    <- Factor ([*/] Factor)*

To help with these two use cases, the bench.helpers.apply_infix() and bench.helpers.reduce_infix() functions are provided:

>>> from operator import add, sub
>>> from bench.helpers import apply_infix, reduce_infix
>>> apply_infix(apply_infix(1, sub, 2), add, 3)
2
>>> reduce_infix(1, sub, 2, add, 3)
2

Also note that these functions just return the first argument if there are no operations:

>>> apply_infix(3)
3
>>> reduce_infix(3)
3