New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune parts of the data tree that have discards in them #2185
Conversation
70f902d
to
e208a18
Compare
It looks like this potentially improves our data generation quality a fair bit, as one of the reasons it's making the tests fail is that it's finding a bunch of examples that they previously didn't! |
155ee59
to
7861292
Compare
(The failing build is for unrelated reasons that I will fix tomorrow in a separate PR. It shouldn't block reviewing this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
hypothesis-python/src/hypothesis/internal/conjecture/datatree.py
Outdated
Show resolved
Hide resolved
I vaguely recall that there is logic in the shrunken to handle cases where removing discards doesn’t actually work. Is that something we need to worry about here? Or is it fine to proactively prune any part of the death tree beyond a discard point? |
It's something I thought about and should have documented my thoughts on, but basically I think it's fine. The failure mode is that if you filter (or use In particular note that we do not delete the discarded subtree. During shrinking we can still generate new examples in that subtree (I tried making it so we declared all examples that passed a known discard invalid, and that didn't work out so well). All the exhaustedness checks are used for is generating novel prefixes and determining whether we've done enough generation. |
…r when we get exhaustion wrong
586cddb
to
12f16f0
Compare
Key Idea: If we've discarded an example then everything below that point in the tree necessarily contains discards and thus is redundant and so we might as well prune it out of the tree, encouraging us to pursue parts of the tree that just skip out on the discarded bits altogether in future.
I've been thinking about how to do this for a while and all of the attempts I've thought about have been awful, and then ~I realised this morning that it's fairly easy to do by using the first completed discarded example as the point after which we know that going deeper into the tree is redundant.
The big big win here is on
integer_range
which no longer creates huge redundant sequences on small ranges that aren't a multiple of two. e.g.@given(integers(0, 2))
used to produce a lot of examples (until it hit some heuristics I hacked in to block it off in #2030) because our deduplication logic would push it deeper and deepr into the tree. Now because we mark those parts of the tree as exhausted we will only try up to twice as many valuesas the size of the range (we get arbitrarily close to twice when the range size is2 ** n + 1
).I wanted to make use of this in
weighted_coin
but currently we don't use discards there and it's quite hard to change it so that it does, so I punted that for future work.