Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune parts of the data tree that have discards in them #2185

Merged
merged 8 commits into from Nov 8, 2019

Conversation

DRMacIver
Copy link
Member

@DRMacIver DRMacIver commented Nov 7, 2019

Key Idea: If we've discarded an example then everything below that point in the tree necessarily contains discards and thus is redundant and so we might as well prune it out of the tree, encouraging us to pursue parts of the tree that just skip out on the discarded bits altogether in future.

I've been thinking about how to do this for a while and all of the attempts I've thought about have been awful, and then ~I realised this morning that it's fairly easy to do by using the first completed discarded example as the point after which we know that going deeper into the tree is redundant.

The big big win here is on integer_range which no longer creates huge redundant sequences on small ranges that aren't a multiple of two. e.g. @given(integers(0, 2)) used to produce a lot of examples (until it hit some heuristics I hacked in to block it off in #2030) because our deduplication logic would push it deeper and deepr into the tree. Now because we mark those parts of the tree as exhausted we will only try up to twice as many valuesas the size of the range (we get arbitrarily close to twice when the range size is 2 ** n + 1).

I wanted to make use of this in weighted_coin but currently we don't use discards there and it's quite hard to change it so that it does, so I punted that for future work.

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label Nov 7, 2019
@DRMacIver DRMacIver changed the title Prune parts of the data tree that have discards in them WIP: Prune parts of the data tree that have discards in them Nov 7, 2019
@DRMacIver
Copy link
Member Author

It looks like this potentially improves our data generation quality a fair bit, as one of the reasons it's making the tests fail is that it's finding a bunch of examples that they previously didn't!

@DRMacIver DRMacIver changed the title WIP: Prune parts of the data tree that have discards in them Prune parts of the data tree that have discards in them Nov 7, 2019
@DRMacIver
Copy link
Member Author

(The failing build is for unrelated reasons that I will fix tomorrow in a separate PR. It shouldn't block reviewing this)

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Zalathar
Copy link
Contributor

Zalathar commented Nov 8, 2019

I vaguely recall that there is logic in the shrunken to handle cases where removing discards doesn’t actually work.

Is that something we need to worry about here? Or is it fine to proactively prune any part of the death tree beyond a discard point?

@DRMacIver
Copy link
Member Author

I vaguely recall that there is logic in the shrunken to handle cases where removing discards doesn’t actually work.
Is that something we need to worry about here? Or is it fine to proactively prune any part of the death tree beyond a discard point?

It's something I thought about and should have documented my thoughts on, but basically I think it's fine.

The failure mode is that if you filter (or use assume in a map or something) using a function with a side effect then this pruning is technically invalid. However this only causes a problem if there is no way to generate a good example without triggering a discard.

In particular note that we do not delete the discarded subtree. During shrinking we can still generate new examples in that subtree (I tried making it so we declared all examples that passed a known discard invalid, and that didn't work out so well). All the exhaustedness checks are used for is generating novel prefixes and determining whether we've done enough generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants