Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce randomized testing for queries, fix the revealed bugs #1496

Merged
merged 4 commits into from Nov 21, 2021

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Nov 21, 2021

Randomized testing is already used in Tree-sitter's test suite for verify that incremental parsing behaves correctly. But previously, Tree-sitter's query engine was only tested via hand-written examples. This PR adds some limited randomized testing of the query engine.

Strategy

The new tests use the following procedure:

  1. Parse one example file.
  2. Pick a random node in the resulting syntax tree.
  3. Generate a random query that matches that syntax node.
  4. Parse a second example file.
  5. Compute the "expected matches" for the random query and this second syntax tree, using a brute-force, backtracking approach.
  6. Generate a list of "actual matches" by running the query itself on this second tree.
  7. Check that the actual matches are the same as the expected matches.

Findings

This test found a number of patterns which gave false "impossible pattern" errors when constructing queries, due to bugs in the query analysis. They also surfaced a number of places where we were unnecessarily splitting match states, which ultimately resulted in incorrectly-ordered results.

Limitations

The test that's checked-in has some restrictions:

  1. It only uses the Rust grammar.
  2. It only draws from one particular example file
  3. It does not generate patterns with optional nodes, repeated nodes, anchors, or alternatives
  4. A new random seed is not selected every time you run the test suite: it always run the same 100 random examples. To explore more seeds, you have to change the test code.

Next Steps

After this, it'll be easy to gradually enhance the randomized tests to exercise more languages, more example files, and more of the query syntax. We should also test iterating through captures, not just matches.

* Fix bugs related to named wildcard patterns vs regular wildcard patterns.
* Fix handling of extra nodes during query analysis. Previously, the
expected child_index was updated incorrectly after an extra node,
leading to false "impossible pattern" errors.
* Refine logic for avoiding unnecessary state-splitting due to fallible steps.
Compute *two* different analysis results related to step fallibility:
  * `root_pattern_guaranteed` which, like before, summarizes whether the
    entire pattern is guaranteed to match once this step is reached.
  * `parent_pattern_guaranteed` - which just indicates whether the
    immediate parent pattern is guaranteed. This is now used when
    deciding whether it's necessary to split a match state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant