Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a strategy for new Numpy PRNGs #3131

Open
rsokl opened this issue Nov 2, 2021 · 10 comments
Open

Add a strategy for new Numpy PRNGs #3131

rsokl opened this issue Nov 2, 2021 · 10 comments
Assignees
Labels
enhancement it's not broken, but we want it to be better

Comments

@rsokl
Copy link
Contributor

rsokl commented Nov 2, 2021

This is going to be a somewhat sprawling issues. All of the topics here involve Hypothesis' approaches to making random code deterministic. I will happily close this and turn it into a collection of modular issues/PRs, but first I want to lay everything out and get @Zac-HD 's input.

Weakrefs

(Addressed in #3135 )

We should only make weak references to the generators that we manage (as well as other "register" functions that Hypothesis provides)

NumPy

NumPy has moved away from its old global random state (e.g. np.random.seed, np.random.uniform, etc.). In favor of a new RNG system that uses a combination of bit-generators and generators. This API is very different from those of global-state RNG systems. Presently, it is not clear how a user should have Hypothesis make their numpy.random code deterministic.

To me, the bare-minimum would involve identifying the appropriate substitutes for seed, get_state, and set_state in terms of the new bit-generatore/generator system, and provide a shim to make it trivial for users to register this new source of RNG.

A much more ambitious goal is to still, magically, handle all of this for the user. The only thing that comes to mind is to have NumPy register the creation of new generators, and we then tap into that registry to manage those generators. I would not be surprised if NumPy (understandably) does not want to do this.

Some near-term To-Dos:

  • Become familiar with the new system, and assess if there are obvious substitutes for seed, get_state, and set_state for users to leverage
  • Post to NumPy's mailing list about our desire to make random tests behave deterministically -- under this new system -- and see if anyone has any ideas

Useful reference material

PyTorch

See if PyTorch is willing to add a plugin so that Hypothesis will manage their global generator like this (but with register_random instead of register_type_strategy).

Additionally, torch also supplies a Generator. I recall reading that PyTorch was planning to redesign things like DataLoaders to accept generators, which is similar to the new best practices for NumPy's RNG. Thus, any solution we cook up for the NumPy case should be designed to be future-compatible here as well.

Edit: I just realized that PyTorch actually uses Hypothesis for some of its tests. As far as I can tell, they do not use register_random in their test suite

@Zac-HD
Copy link
Member

Zac-HD commented Nov 29, 2021

Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?

If so, we only need to register the global PRNG that the generators are seeded off, and everything will work from there.

@rsokl
Copy link
Contributor Author

rsokl commented Nov 29, 2021

Isn't the whole point that of these new interfaces that users explicitly pass the generator object around?

Yep, that is correct!

we only need to register the global PRNG that the generators are seeded off

My understanding is that the generator objects are not seeded off of a global generator, and that they can only be seeded independently; I think being able to use a global PRNG would defeat the purpose of numpy's redesign. The reason why the new system expects folks to pass around generator objects is that those generator objects can be used/seeded without concern that, in some other portion of the code, the generator object is silently getting re-seeded.

@Zac-HD
Copy link
Member

Zac-HD commented Nov 29, 2021

So what do we need to do then? I was thinking of monkeypatching np.random.default_rng() to use a known seed when passed None, instead of (or by) controlling the PRNG that seed would otherwise be drawn from.

If the user passes an explicitly-seeded PRNG, it should be pretty obvious what's happening when or if we raise Flaky.

@rsokl
Copy link
Contributor Author

rsokl commented Nov 30, 2021

When making this post, my thoughts were that we would involve identify the appropriate substitutes for seed, get_state, and set_state in terms of the new bit-generator/generator system, and provide a shim to make it trivial for users to register their new sources of RNG. I still think that this is a good path forward, although I'll be interested if folks from the NumPy mailing list have other ideas.

I am hoping to eventually find some time to loop back and hit some of the To-Dos that I laid out in my original post. It is just a matter of me scrounging up time to do so.

@rsokl
Copy link
Contributor Author

rsokl commented Nov 30, 2021

Oh! We could also make a strategy in hypothesis.extra.numpy that hands a user a generator that they can pass to their test code/other strategies, and that we manage for them (this would still involve our figuring out the seed/get_state/set_state substitutes)! This probably is an even more convenient and obvious (and easy to document) solution for users.

@Zac-HD
Copy link
Member

Zac-HD commented Nov 30, 2021

Based on a quick conversation, we plan to:

  • add a new strategy npst.rngs() (todo better name), which will basically be st.builds(np.random.default_rng, st.integers()) with a nicer repr - much like st.randoms(use_true_random=True).
  • have our seed-and-restore logic monkeypatch default_rng() in order to use a constant seed instead of a random seed, much like we set the state for global Random instances (or use a drawn seed with st.random_module(), etc.). People should use the former, but it's important that we give a nice user experience even if without best-practices.

@Zac-HD Zac-HD changed the title Improvements to Hypothesis' ability to make random code deterministic Add a strategy for new Numpy PRNGs Dec 28, 2021
@Zac-HD Zac-HD mentioned this issue Jul 14, 2022
20 tasks
@Zac-HD Zac-HD mentioned this issue Nov 2, 2022
@matteoacrossi
Copy link

Are there any plans to address this issue?

@Zac-HD
Copy link
Member

Zac-HD commented Feb 9, 2024

Hypothesis is an all-volunteer project, and so far people have been volunteering on other issues instead.

If you're interested in helping out, I'm very happy to support that through advice, code review, and so on 😊

@matteoacrossi
Copy link

I would love to contribute but I don't know the internal workings of hypothesis. I was looking at #3510, is that a good starting point?

@Zac-HD
Copy link
Member

Zac-HD commented Feb 12, 2024

Yep, that's a great place to start!

I think this should be a pretty self-contained change - it'd be perfectly feasible to implement this strategy downstream, we want to provide it in hypothesis.extra.numpy to make users' lives easier rather than because it needs internals 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

No branches or pull requests

3 participants