Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and use maximum number of distinct elements a strategy can produce #2035

Closed
DRMacIver opened this issue Jul 4, 2019 · 3 comments
Closed
Labels
enhancement it's not broken, but we want it to be better

Comments

@DRMacIver
Copy link
Member

As of #2031 when generating unique collections with elements from from a sampled_from, we use the number of elements being sampled from to adjust the size. This could be made more generic - for example it doesn't work if you hide the sampled_from behind a map or a filter, but it also doesn't work with e.g. one_of(just(1), just(2)) despite it being logically equivalent to a sampled_from. This would significantly improve the behaviour of unique collections (sets and lists with unique=True mainly.)

A generic approach would be to track an upper bound on the number of distinct elements producable by any strategy. This would default to infinity and strategies that are finite and small could set a smaller bound, which could then be used in a similar way to adjust generated unique collections by:

  • Capping the max_size to the number of distinct elements.
  • Raising an error if the min_size is greater than the number of distinct elements.

The main behaviour to be implemented should be:

  1. By default the upper bound is 0 if is_empty is true, else infinite.
  2. sampled_from has an upper bound equal to its number of elements.
  3. one_of has an upper bound equal to the sum of the upper bounds of its elements
  4. map and filter have the same upper bound as their underlying strategy.
  5. Lazy strategies defer to their underlying implementation

Other optional ones:

  • Tuples have an upper bound equal to the product of the upper bounds of their elements (probably worth capping it at some value, say about 1000, and setting it to infinity if it it exceeds that to avoid blowup)
  • Small permutations might be worth setting an upper bound on
  • Lists etc. with a max_size could have an upper bound, but we're probably at the point of diminishing returns.
@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label Jul 8, 2019
@leaprovenzano
Copy link
Contributor

this would be really great for sorting strategy unions .... i was thinking how nice it would be if i could just sort the strategies to ensure simple ones are first in args list. map and filter are a little sticky though... but assuming the filter filters at least one thing the filtered strategy could be estimated at parent -1 .
i did an ordering to one_of using a customised find based thing as a key recently but it was sorta crappy and I didn't know if we had anything like this available in internals...
maybe there would be room to define ordering operators on existing strategies with something like this?
for example:
though integers() and floats() are theoretically both have infinite max_size ( ignoring system constraints and precision and stuff ) integers() < floats() since integers() is discrete...
likewise integers(min_value=0) < integers() and so on...

@Zac-HD
Copy link
Member

Zac-HD commented Dec 23, 2019

The number of possible outputs is a pretty poor proxy for the complexity of outputs! This is also intended for internal shortcuts only, so it might consider anything about 10k to be infinite...

More generally, we don't actually know what's simpler for the user's application, so we just avoid reordering anything if we can. This isn't usually a problem!

@Zac-HD
Copy link
Member

Zac-HD commented Mar 21, 2024

Closing this because we'll get almost all the benefits by having an enormously-less-redundant IR; see #3921 and #3932 (comment) 😁

@Zac-HD Zac-HD closed this as completed Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

No branches or pull requests

3 participants