Introduce randomized testing for queries, fix the revealed bugs #1496

maxbrunsfeld · 2021-11-21T19:58:23Z

Randomized testing is already used in Tree-sitter's test suite for verify that incremental parsing behaves correctly. But previously, Tree-sitter's query engine was only tested via hand-written examples. This PR adds some limited randomized testing of the query engine.

Strategy

The new tests use the following procedure:

Parse one example file.
Pick a random node in the resulting syntax tree.
Generate a random query that matches that syntax node.
Parse a second example file.
Compute the "expected matches" for the random query and this second syntax tree, using a brute-force, backtracking approach.
Generate a list of "actual matches" by running the query itself on this second tree.
Check that the actual matches are the same as the expected matches.

Findings

This test found a number of patterns which gave false "impossible pattern" errors when constructing queries, due to bugs in the query analysis. They also surfaced a number of places where we were unnecessarily splitting match states, which ultimately resulted in incorrectly-ordered results.

Limitations

The test that's checked-in has some restrictions:

It only uses the Rust grammar.
It only draws from one particular example file
It does not generate patterns with optional nodes, repeated nodes, anchors, or alternatives
A new random seed is not selected every time you run the test suite: it always run the same 100 random examples. To explore more seeds, you have to change the test code.

Next Steps

After this, it'll be easy to gradually enhance the randomized tests to exercise more languages, more example files, and more of the query syntax. We should also test iterating through captures, not just matches.

* Fix bugs related to named wildcard patterns vs regular wildcard patterns. * Fix handling of extra nodes during query analysis. Previously, the expected child_index was updated incorrectly after an extra node, leading to false "impossible pattern" errors. * Refine logic for avoiding unnecessary state-splitting due to fallible steps. Compute *two* different analysis results related to step fallibility: * `root_pattern_guaranteed` which, like before, summarizes whether the entire pattern is guaranteed to match once this step is reached. * `parent_pattern_guaranteed` - which just indicates whether the immediate parent pattern is guaranteed. This is now used when deciding whether it's necessary to split a match state.

maxbrunsfeld added 3 commits November 21, 2021 11:29

Add a randomized test for query matching

f69c486

Rename Query::step_is_definite -> is_pattern_guaranteed_at_step

142f4b6

Improve query execution logging

fea3eca

maxbrunsfeld force-pushed the query-randomized-tests branch from 54f97b2 to 4a7fb41 Compare November 21, 2021 20:01

maxbrunsfeld force-pushed the query-randomized-tests branch from 4a7fb41 to 26dac9b Compare November 21, 2021 20:03

maxbrunsfeld merged commit 862fe9e into master Nov 21, 2021

maxbrunsfeld deleted the query-randomized-tests branch November 21, 2021 20:27

theHamsta mentioned this pull request Nov 22, 2021

build(deps): bump tree-sitter and treesitter-c to 0.20.1 neovim/neovim#16402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Introduce randomized testing for queries, fix the revealed bugs #1496

Introduce randomized testing for queries, fix the revealed bugs #1496

maxbrunsfeld commented Nov 21, 2021 •

edited

Loading

Uh oh!

Introduce randomized testing for queries, fix the revealed bugs #1496

Introduce randomized testing for queries, fix the revealed bugs #1496

Conversation

maxbrunsfeld commented Nov 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Strategy

Findings

Limitations

Next Steps

Uh oh!

maxbrunsfeld commented Nov 21, 2021 •

edited

Loading