Prune parts of the data tree that have discards in them #2185

DRMacIver · 2019-11-07T11:21:50Z

Key Idea: If we've discarded an example then everything below that point in the tree necessarily contains discards and thus is redundant and so we might as well prune it out of the tree, encouraging us to pursue parts of the tree that just skip out on the discarded bits altogether in future.

I've been thinking about how to do this for a while and all of the attempts I've thought about have been awful, and then ~I realised this morning that it's fairly easy to do by using the first completed discarded example as the point after which we know that going deeper into the tree is redundant.

The big big win here is on integer_range which no longer creates huge redundant sequences on small ranges that aren't a multiple of two. e.g. @given(integers(0, 2)) used to produce a lot of examples (until it hit some heuristics I hacked in to block it off in #2030) because our deduplication logic would push it deeper and deepr into the tree. Now because we mark those parts of the tree as exhausted we will only try up to twice as many valuesas the size of the range (we get arbitrarily close to twice when the range size is 2 ** n + 1).

I wanted to make use of this in weighted_coin but currently we don't use discards there and it's quite hard to change it so that it does, so I punted that for future work.

DRMacIver · 2019-11-07T16:27:21Z

It looks like this potentially improves our data generation quality a fair bit, as one of the reasons it's making the tests fail is that it's finding a bunch of examples that they previously didn't!

DRMacIver · 2019-11-07T21:04:45Z

(The failing build is for unrelated reasons that I will fix tomorrow in a separate PR. It shouldn't block reviewing this)

Zac-HD

LGTM 👍

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py

hypothesis-python/tests/cover/test_conjecture_engine.py

Zalathar · 2019-11-08T02:45:22Z

I vaguely recall that there is logic in the shrunken to handle cases where removing discards doesn’t actually work.

Is that something we need to worry about here? Or is it fine to proactively prune any part of the death tree beyond a discard point?

DRMacIver · 2019-11-08T07:19:29Z

I vaguely recall that there is logic in the shrunken to handle cases where removing discards doesn’t actually work.
Is that something we need to worry about here? Or is it fine to proactively prune any part of the death tree beyond a discard point?

It's something I thought about and should have documented my thoughts on, but basically I think it's fine.

The failure mode is that if you filter (or use assume in a map or something) using a function with a side effect then this pruning is technically invalid. However this only causes a problem if there is no way to generate a good example without triggering a discard.

In particular note that we do not delete the discarded subtree. During shrinking we can still generate new examples in that subtree (I tried making it so we declared all examples that passed a known discard invalid, and that didn't work out so well). All the exhaustedness checks are used for is generating novel prefixes and determining whether we've done enough generation.

…r when we get exhaustion wrong

Zac-HD added the enhancement it's not broken, but we want it to be better label Nov 7, 2019

DRMacIver changed the title ~~Prune parts of the data tree that have discards in them~~ WIP: Prune parts of the data tree that have discards in them Nov 7, 2019

DRMacIver force-pushed the DRMacIver/prune-dead branch from 70f902d to e208a18 Compare November 7, 2019 15:16

DRMacIver changed the title ~~WIP: Prune parts of the data tree that have discards in them~~ Prune parts of the data tree that have discards in them Nov 7, 2019

DRMacIver force-pushed the DRMacIver/prune-dead branch from 155ee59 to 7861292 Compare November 7, 2019 19:15

Zac-HD approved these changes Nov 7, 2019

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py Outdated Show resolved Hide resolved

Zalathar reviewed Nov 8, 2019

View reviewed changes

hypothesis-python/tests/cover/test_conjecture_engine.py Outdated Show resolved Hide resolved

DRMacIver added 8 commits November 8, 2019 20:23

Remove children from Branch repr to get sensible sized repr

8ed11c6

Add an assertion when that loop gets too long to make debugging easie…

b9bda4f

…r when we get exhaustion wrong

Skip broken regex tests on Python 2

83c399c

That test was wrong

4563eb3

Kill subtrees once we've seen a discarded example

731ce80

Remove debugging change

b77406b

Remove useless code

60cc77c

Add giant explanatory comment about tradeoffs

12f16f0

Zac-HD force-pushed the DRMacIver/prune-dead branch from 586cddb to 12f16f0 Compare November 8, 2019 09:23

Zac-HD merged commit a1f4ea4 into master Nov 8, 2019

Zac-HD deleted the DRMacIver/prune-dead branch November 8, 2019 10:01

DRMacIver mentioned this pull request Dec 20, 2019

Remove consecutive discards heuristic #2290

Merged

Zac-HD mentioned this pull request Jan 19, 2020

Strategies deduplication does not working for st.none as expected in OneOfStrategy.element_strategies #2327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune parts of the data tree that have discards in them #2185

Prune parts of the data tree that have discards in them #2185

DRMacIver commented Nov 7, 2019 •

edited

DRMacIver commented Nov 7, 2019

DRMacIver commented Nov 7, 2019

Zac-HD left a comment

Zalathar commented Nov 8, 2019

DRMacIver commented Nov 8, 2019

Prune parts of the data tree that have discards in them #2185

Prune parts of the data tree that have discards in them #2185

Conversation

DRMacIver commented Nov 7, 2019 • edited

DRMacIver commented Nov 7, 2019

DRMacIver commented Nov 7, 2019

Zac-HD left a comment

Choose a reason for hiding this comment

Zalathar commented Nov 8, 2019

DRMacIver commented Nov 8, 2019

DRMacIver commented Nov 7, 2019 •

edited