Add a strategy for new Numpy PRNGs #3131

rsokl · 2021-11-02T03:40:45Z

This is going to be a somewhat sprawling issues. All of the topics here involve Hypothesis' approaches to making random code deterministic. I will happily close this and turn it into a collection of modular issues/PRs, but first I want to lay everything out and get @Zac-HD 's input.

Weakrefs

(Addressed in #3135 )

We should only make weak references to the generators that we manage (as well as other "register" functions that Hypothesis provides)

NumPy

NumPy has moved away from its old global random state (e.g. np.random.seed, np.random.uniform, etc.). In favor of a new RNG system that uses a combination of bit-generators and generators. This API is very different from those of global-state RNG systems. Presently, it is not clear how a user should have Hypothesis make their numpy.random code deterministic.

To me, the bare-minimum would involve identifying the appropriate substitutes for seed, get_state, and set_state in terms of the new bit-generatore/generator system, and provide a shim to make it trivial for users to register this new source of RNG.

A much more ambitious goal is to still, magically, handle all of this for the user. The only thing that comes to mind is to have NumPy register the creation of new generators, and we then tap into that registry to manage those generators. I would not be surprised if NumPy (understandably) does not want to do this.

Some near-term To-Dos:

Become familiar with the new system, and assess if there are obvious substitutes for seed, get_state, and set_state for users to leverage
Post to NumPy's mailing list about our desire to make random tests behave deterministically -- under this new system -- and see if anyone has any ideas

Useful reference material

PyTorch

See if PyTorch is willing to add a plugin so that Hypothesis will manage their global generator like this (but with register_random instead of register_type_strategy).

Additionally, torch also supplies a Generator. I recall reading that PyTorch was planning to redesign things like DataLoaders to accept generators, which is similar to the new best practices for NumPy's RNG. Thus, any solution we cook up for the NumPy case should be designed to be future-compatible here as well.

Edit: I just realized that PyTorch actually uses Hypothesis for some of its tests. As far as I can tell, they do not use register_random in their test suite

The text was updated successfully, but these errors were encountered:

Zac-HD · 2021-11-29T11:18:16Z

Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?

If so, we only need to register the global PRNG that the generators are seeded off, and everything will work from there.

rsokl · 2021-11-29T14:35:00Z

Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?

Yep, that is correct!

we only need to register the global PRNG that the generators are seeded off

My understanding is that the generator objects are not seeded off of a global generator, and that they can only be seeded independently; I think being able to use a global PRNG would defeat the purpose of numpy's redesign. The reason why the new system expects folks to pass around generator objects is that those generator objects can be used/seeded without concern that, in some other portion of the code, the generator object is silently getting re-seeded.

Zac-HD · 2021-11-29T21:33:14Z

So what do we need to do then? I was thinking of monkeypatching np.random.default_rng() to use a known seed when passed None, instead of (or by) controlling the PRNG that seed would otherwise be drawn from.

If the user passes an explicitly-seeded PRNG, it should be pretty obvious what's happening when or if we raise Flaky.

rsokl · 2021-11-30T04:26:26Z

When making this post, my thoughts were that we would involve identify the appropriate substitutes for seed, get_state, and set_state in terms of the new bit-generator/generator system, and provide a shim to make it trivial for users to register their new sources of RNG. I still think that this is a good path forward, although I'll be interested if folks from the NumPy mailing list have other ideas.

I am hoping to eventually find some time to loop back and hit some of the To-Dos that I laid out in my original post. It is just a matter of me scrounging up time to do so.

rsokl · 2021-11-30T04:32:50Z

Oh! We could also make a strategy in hypothesis.extra.numpy that hands a user a generator that they can pass to their test code/other strategies, and that we manage for them (this would still involve our figuring out the seed/get_state/set_state substitutes)! This probably is an even more convenient and obvious (and easy to document) solution for users.

Zac-HD · 2021-11-30T05:19:25Z

Based on a quick conversation, we plan to:

add a new strategy npst.rngs() (todo better name), which will basically be st.builds(np.random.default_rng, st.integers()) with a nicer repr - much like st.randoms(use_true_random=True).
have our seed-and-restore logic monkeypatch default_rng() in order to use a constant seed instead of a random seed, much like we set the state for global Random instances (or use a drawn seed with st.random_module(), etc.). People should use the former, but it's important that we give a nice user experience even if without best-practices.

matteoacrossi · 2024-02-09T10:03:39Z

Are there any plans to address this issue?

Zac-HD · 2024-02-09T18:25:00Z

Hypothesis is an all-volunteer project, and so far people have been volunteering on other issues instead.

If you're interested in helping out, I'm very happy to support that through advice, code review, and so on 😊

matteoacrossi · 2024-02-12T17:17:30Z

I would love to contribute but I don't know the internal workings of hypothesis. I was looking at #3510, is that a good starting point?

Zac-HD · 2024-02-12T19:26:17Z

Yep, that's a great place to start!

I think this should be a pretty self-contained change - it'd be perfectly feasible to implement this strategy downstream, we want to provide it in hypothesis.extra.numpy to make users' lives easier rather than because it needs internals 🙂

rsokl added the enhancement it's not broken, but we want it to be better label Nov 2, 2021

Zac-HD mentioned this issue Nov 3, 2021

Use weak references in register_* functions so that garbage collection still works #3135

Merged

Zac-HD assigned rsokl Nov 30, 2021

Zac-HD changed the title ~~Improvements to Hypothesis' ability to make random code deterministic~~ Add a strategy for new Numpy PRNGs Dec 28, 2021

Zac-HD mentioned this issue Jul 14, 2022

🏃 Sprints meta-issue #3402

Closed

20 tasks

renefritze mentioned this issue Sep 22, 2022

New approach to handling randomness in pyMOR pymor/pymor#1736

Merged

Zac-HD mentioned this issue Nov 2, 2022

pylint fixes #3499

Closed

rsokl mentioned this issue Nov 19, 2022

Add strategy for numpy.random.Generator #3510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a strategy for new Numpy PRNGs #3131

Add a strategy for new Numpy PRNGs #3131

rsokl commented Nov 2, 2021 •

edited

Zac-HD commented Nov 29, 2021

rsokl commented Nov 29, 2021 •

edited

Zac-HD commented Nov 29, 2021

rsokl commented Nov 30, 2021 •

edited

rsokl commented Nov 30, 2021

Zac-HD commented Nov 30, 2021

matteoacrossi commented Feb 9, 2024

Zac-HD commented Feb 9, 2024

matteoacrossi commented Feb 12, 2024

Zac-HD commented Feb 12, 2024

Add a strategy for new Numpy PRNGs #3131

Add a strategy for new Numpy PRNGs #3131

Comments

rsokl commented Nov 2, 2021 • edited

Weakrefs

NumPy

Useful reference material

PyTorch

Zac-HD commented Nov 29, 2021

rsokl commented Nov 29, 2021 • edited

Zac-HD commented Nov 29, 2021

rsokl commented Nov 30, 2021 • edited

rsokl commented Nov 30, 2021

Zac-HD commented Nov 30, 2021

matteoacrossi commented Feb 9, 2024

Zac-HD commented Feb 9, 2024

matteoacrossi commented Feb 12, 2024

Zac-HD commented Feb 12, 2024

rsokl commented Nov 2, 2021 •

edited

rsokl commented Nov 29, 2021 •

edited

rsokl commented Nov 30, 2021 •

edited