Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating and moving the grammar to a separate file #2

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

mtdowling
Copy link
Contributor

This commit simplifies the JMESPath grammar and moved the grammar to a
separate downloadable file.

This is WIP PR. I'm finding other places in the grammar that I think could be improved.

This commit simplifies the JMESPath grammar and moved the grammar to a
separate downloadable file.
@mtdowling
Copy link
Contributor Author

Actually, "||" and "|" are still ambiguous. What's the desired precedence here?

I found more ambiguities with pipes and ors: a | b | c was ambiguous, so I've updated the grammar. I also noticed an issue with my grammar where projections were consuming pipes and ors. I've made a change to address this.

I've also adding slicing and grouping to the grammar.

@jamesls
Copy link
Member

jamesls commented Dec 22, 2014

I had an updated grammar somewhere that had part of the precedence baked into the grammar. It wasn't complete but it had the or/pipe's worked out. I'll see if I can find that branch. We should double check that it's consistent with this.

The desired precedence is that a || b | b is parsed as ((a || b) | c). I believe we have a compliance test somewhere that verifies this. I confirmed that this is how the python lib parses this expression.

>>> pprint(jmespath.compile('a || b | c').parsed)
{'children': [{'children': [{'children': [], 'type': 'field', 'value': 'a'},
                            {'children': [], 'type': 'field', 'value': 'b'}],
               'type': 'or_expression'},
              {'children': [], 'type': 'field', 'value': 'c'}],
 'type': 'pipe'}

@mtdowling
Copy link
Contributor Author

Good catch. I've pushed a commit that ensures that or expressions bind more tightly than pipe expressions.

Added an "and-expression".
Added a "not" expression.
not and binary expressions can be at the root level.
Filter expressions now support unary conditions.
Simplified function argument grammar.
Fixed a bug in how literals were described.
@mtdowling
Copy link
Contributor Author

I've now shortened the majority of the rule names (e.g., expression is now expr). This shortening of the rule names seems to be in line with most other grammars I've seen in the wild (i.e., http://www.lua.org/manual/5.2/manual.html#9 and https://docs.python.org/3/reference/grammar.html).

I updated the or and and to have an explicit precedence so that and binds more tightly than or.

I updated subexpr to now split out into object-subexpr and array-subexpr to better distinguish between object type access using a "." and array style access using a "[". This change fixes a bug that is present in the current grammar that allows you to use dot style object access off of a multi-list (i.e., this is no longer allowed because you cannot access an array as an object: [a, b].c). I think splitting these rules out will help us to very granularly specify what and how subexpressions descend into data.

I think this grammar update is pretty close to being a fully functional and non-ambiguous replacement to the current grammar. I'm using this grammar verbatim in the Clojure implementation, so you can use that and play around with how it parses to see how everything works. I'll continue to flesh out the Clojure implementation and try to get it fully compliant with the test suite to ensure that the grammar is backwards compatible.

@mtdowling
Copy link
Contributor Author

I pushed a change that allows for insignificant whitespace built into the grammar (similar to the JSON ABNF). This makes the grammar much more robust and does not require implementation details to be part of parsing (i.e., we no longer need to say "whitespace is insignificant except inside of quotes and literals).

@jamesls
Copy link
Member

jamesls commented Apr 10, 2015

I'm starting to pick up work again on JEP-9 (the improved filters with ands, nots, parens, etc) and I want to incorporate this grammar as part of that work. I'm starting to take a look at this.

wildcard-values = "*"
current-node = "@"

number = *"-" 1*DIGIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't have the *. It should just be optional instead of zero or more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, should be ["-"] 1*DIGIT

@mtdowling
Copy link
Contributor Author

I'll send an updated PR shortly

current-node = "@"

number = *"-" 1*DIGIT
literal = "`" 1*(unescaped-literal / escaped-literal) "`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for this change? I would prefer to have the JSON grammar in here. Otherwise this will allow invalid strings such as:

`[`

which is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought [ was valid (currently) because JMESPath allows you to pass in unquoted strings here. It will try to parse and see that it failed and then parse it as "[".

@mtdowling
Copy link
Contributor Author

Actually, I've got an updated copy of the grammar that includes JSON and fixes a bunch of stuff from the clojure implementation... Updating coming.

Incorporates JSON back into the grammar.
Much better use of insignificant whitespace.
Uses `[optional]` syntax as it is clearer than `1* optional`.
@mtdowling
Copy link
Contributor Author

I've updated the grammar to include the changes from jmespath.clj. This includes adding JSON back into the grammar and better whitespace control.

@mtdowling mtdowling changed the title Simplifying grammar and moving to a separate file Updating and moving the grammar to a separate file Apr 11, 2015

; "&&" binds more tightly than "||"
; "||" binds more tightly than "|".
terminal = pipe / or / and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why is this called terminal? I'm thinking of terminals as the leaf nodes in the parsed tree, i.e it can't be replaced with anything. But everything on the RHS can be broken down into more nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it isn't a good name. I named it this because it owns the entire side. Do you have a suggestion?

@jamesls jamesls mentioned this pull request Apr 12, 2015
@mtdowling
Copy link
Contributor Author

I've updated the grammar to support JEP 12 and added support to the Clojure implementation for raw string literals.

; Strings, identifiers, and characters.
raw-string = "'" *raw-string-char "'"
raw-string-char = (%x20-26 / %x28-5B / %x5D-10FFFF) / raw-string-escape
raw-string-escape = escape ["'"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up, I synced with this latest version that includes raw string literals and abnfstress caught this. The ["'"] means that the closing single quote is optional so strings such as '\' would be valid. This should be raw-string-escape = escape "'"

@springcomp springcomp mentioned this pull request Aug 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants