Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of unique lists with elements=sampled_from(...) #2031

Merged
merged 2 commits into from Jul 4, 2019

Conversation

DRMacIver
Copy link
Member

Failures in #2030 demonstrated that the current approach for this is still filtering out a lot of data. This replaces our special casing with the other approach I suggested, which uses a lazy fisher-yates shuffle to get better behaviour for this sampling.

WIP because I expect this to fail the test suite in a bunch of places and leave some code uncovered, and I'm being lazy and letting CI find that out for me.

@DRMacIver DRMacIver force-pushed the DRMacIver/unique-sampled-from branch from c9819bb to 259238b Compare July 2, 2019 17:58
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Widening the scope would be nice" but still clearly a nice upgrade and I'd be happy to merge as-is!

(if you don't extend support though, could you open an enhancement issue? It's good sprint-fodder if nothing else.)

@@ -775,6 +776,26 @@ def unique_by(x):
for i, f in enumerate(unique_by):
if not callable(f):
raise InvalidArgument("unique_by[%i]=%r is not a callable" % (i, f))
# Note that lazy strategies automatically unwrap when passed to a defines_strategy
# function.
if isinstance(elements, SampledFromStrategy):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we went with the previous design because it also handled mapped and filtered strategies with the efficient path - and unfortunately the obvious way to handle that only works if we know that the function or predicate are pure functions, which they might not be...

It's kinda tedious, but if we're adding this special handling it would be nice if it could apply to sampled_from(...).map(str).filter(bool) too 😕

@DRMacIver DRMacIver changed the title [WIP] Improve performance of unique lists with elements=sampled_from(...) Improve performance of unique lists with elements=sampled_from(...) Jul 4, 2019
@DRMacIver DRMacIver force-pushed the DRMacIver/unique-sampled-from branch from 8ef6ef8 to f52b324 Compare July 4, 2019 08:20
@DRMacIver
Copy link
Member Author

"Widening the scope would be nice" but still clearly a nice upgrade and I'd be happy to merge as-is!

I'm not going to widen the scope as that requires a fair bit more work and I mostly wanted to unblock #2030, and this fixes what I think is probably the most common case. I'll open some further enhancement tickets though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants