Improve performance of unique lists with `elements=sampled_from(...)` #2031

DRMacIver · 2019-07-02T11:26:34Z

Failures in #2030 demonstrated that the current approach for this is still filtering out a lot of data. This replaces our special casing with the other approach I suggested, which uses a lazy fisher-yates shuffle to get better behaviour for this sampling.

WIP because I expect this to fail the test suite in a bunch of places and leave some code uncovered, and I'm being lazy and letting CI find that out for me.

Zac-HD

"Widening the scope would be nice" but still clearly a nice upgrade and I'd be happy to merge as-is!

(if you don't extend support though, could you open an enhancement issue? It's good sprint-fodder if nothing else.)

Zac-HD · 2019-07-04T01:08:01Z

hypothesis-python/src/hypothesis/_strategies.py

@@ -775,6 +776,26 @@ def unique_by(x):
        for i, f in enumerate(unique_by):
            if not callable(f):
                raise InvalidArgument("unique_by[%i]=%r is not a callable" % (i, f))
+        # Note that lazy strategies automatically unwrap when passed to a defines_strategy
+        # function.
+        if isinstance(elements, SampledFromStrategy):


IIRC we went with the previous design because it also handled mapped and filtered strategies with the efficient path - and unfortunately the obvious way to handle that only works if we know that the function or predicate are pure functions, which they might not be...

It's kinda tedious, but if we're adding this special handling it would be nice if it could apply to sampled_from(...).map(str).filter(bool) too 😕

DRMacIver · 2019-07-04T08:28:09Z

"Widening the scope would be nice" but still clearly a nice upgrade and I'd be happy to merge as-is!

I'm not going to widen the scope as that requires a fair bit more work and I mostly wanted to unblock #2030, and this fixes what I think is probably the most common case. I'll open some further enhancement tickets though

DRMacIver force-pushed the DRMacIver/unique-sampled-from branch from c9819bb to 259238b Compare July 2, 2019 17:58

Zac-HD approved these changes Jul 4, 2019

View reviewed changes

Zac-HD mentioned this pull request Jul 4, 2019

Improve performance of small tests that use rejection sampling #2030

Merged

DRMacIver changed the title ~~[WIP] Improve performance of unique lists with elements=sampled_from(...)~~ Improve performance of unique lists with elements=sampled_from(...) Jul 4, 2019

DRMacIver added 2 commits July 4, 2019 09:19

Special case unique lists with elements sampled from a finite collection

2a782cd

Add release file

f52b324

DRMacIver force-pushed the DRMacIver/unique-sampled-from branch from 8ef6ef8 to f52b324 Compare July 4, 2019 08:20

DRMacIver mentioned this pull request Jul 4, 2019

Track and use maximum number of distinct elements a strategy can produce #2035

Closed

DRMacIver merged commit 589deb4 into master Jul 4, 2019

DRMacIver deleted the DRMacIver/unique-sampled-from branch July 4, 2019 09:04

This was referenced Dec 1, 2019

Implement booleans() and just() as sampled_from() for better collections #2122

Merged

Fix UniqueSampledListStrategy bug #2248

Merged

Zalathar mentioned this pull request May 29, 2021

Simplify the do_filtered_draw hook #2988

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of unique lists with `elements=sampled_from(...)` #2031

Improve performance of unique lists with `elements=sampled_from(...)` #2031

DRMacIver commented Jul 2, 2019

Zac-HD left a comment •

edited

Zac-HD Jul 4, 2019

DRMacIver commented Jul 4, 2019

Navigation Menu

Improve performance of unique lists with elements=sampled_from(...) #2031

Improve performance of unique lists with elements=sampled_from(...) #2031

Conversation

DRMacIver commented Jul 2, 2019

Zac-HD left a comment • edited

Choose a reason for hiding this comment

Zac-HD Jul 4, 2019

Choose a reason for hiding this comment

DRMacIver commented Jul 4, 2019

Improve performance of unique lists with `elements=sampled_from(...)` #2031

Improve performance of unique lists with `elements=sampled_from(...)` #2031

Zac-HD left a comment •

edited