Track and use maximum number of distinct elements a strategy can produce #2035

DRMacIver · 2019-07-04T08:39:59Z

As of #2031 when generating unique collections with elements from from a sampled_from, we use the number of elements being sampled from to adjust the size. This could be made more generic - for example it doesn't work if you hide the sampled_from behind a map or a filter, but it also doesn't work with e.g. one_of(just(1), just(2)) despite it being logically equivalent to a sampled_from. This would significantly improve the behaviour of unique collections (sets and lists with unique=True mainly.)

A generic approach would be to track an upper bound on the number of distinct elements producable by any strategy. This would default to infinity and strategies that are finite and small could set a smaller bound, which could then be used in a similar way to adjust generated unique collections by:

Capping the max_size to the number of distinct elements.
Raising an error if the min_size is greater than the number of distinct elements.

The main behaviour to be implemented should be:

By default the upper bound is 0 if is_empty is true, else infinite.
sampled_from has an upper bound equal to its number of elements.
one_of has an upper bound equal to the sum of the upper bounds of its elements
map and filter have the same upper bound as their underlying strategy.
Lazy strategies defer to their underlying implementation

Other optional ones:

Tuples have an upper bound equal to the product of the upper bounds of their elements (probably worth capping it at some value, say about 1000, and setting it to infinity if it it exceeds that to avoid blowup)
Small permutations might be worth setting an upper bound on
Lists etc. with a max_size could have an upper bound, but we're probably at the point of diminishing returns.

The text was updated successfully, but these errors were encountered:

leaprovenzano · 2019-12-23T13:51:14Z

this would be really great for sorting strategy unions .... i was thinking how nice it would be if i could just sort the strategies to ensure simple ones are first in args list. map and filter are a little sticky though... but assuming the filter filters at least one thing the filtered strategy could be estimated at parent -1 .
i did an ordering to one_of using a customised find based thing as a key recently but it was sorta crappy and I didn't know if we had anything like this available in internals...
maybe there would be room to define ordering operators on existing strategies with something like this?
for example:
though integers() and floats() are theoretically both have infinite max_size ( ignoring system constraints and precision and stuff ) integers() < floats() since integers() is discrete...
likewise integers(min_value=0) < integers() and so on...

Zac-HD · 2019-12-23T15:54:03Z

The number of possible outputs is a pretty poor proxy for the complexity of outputs! This is also intended for internal shortcuts only, so it might consider anything about 10k to be infinite...

More generally, we don't actually know what's simpler for the user's application, so we just avoid reordering anything if we can. This isn't usually a problem!

Zac-HD · 2024-03-21T05:08:30Z

Closing this because we'll get almost all the benefits by having an enormously-less-redundant IR; see #3921 and #3932 (comment) 😁

Zac-HD added the enhancement it's not broken, but we want it to be better label Jul 8, 2019

Zac-HD mentioned this issue Jul 25, 2020

EuroPython 2020 sprints meta-issue #2510

Closed

Zac-HD mentioned this issue Dec 19, 2023

Migrate DataTree to the new IR #3818

Merged

5 tasks

Zac-HD closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track and use maximum number of distinct elements a strategy can produce #2035

Track and use maximum number of distinct elements a strategy can produce #2035

DRMacIver commented Jul 4, 2019

leaprovenzano commented Dec 23, 2019

Zac-HD commented Dec 23, 2019

Zac-HD commented Mar 21, 2024

Track and use maximum number of distinct elements a strategy can produce #2035

Track and use maximum number of distinct elements a strategy can produce #2035

Comments

DRMacIver commented Jul 4, 2019

leaprovenzano commented Dec 23, 2019

Zac-HD commented Dec 23, 2019

Zac-HD commented Mar 21, 2024