Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add random_state and seed arguments to rrf and adaptive_rrf #1564

Closed
wants to merge 5 commits into from

Conversation

artpelling
Copy link
Member

@artpelling artpelling commented Feb 9, 2022

This PR adds random_state=None and seed=None to the arguments of rrf and adaptive_rrf in order to enable reproducible computations and fixes a typo in the docstring of VectorArray.random.

@artpelling artpelling changed the title add seed argument to rrf function add seed=None argument to rrf Feb 9, 2022
@pmli pmli added pr:fix Fixes a bug pr:new-feature Introduces a new feature labels Feb 9, 2022
@pmli pmli added this to the 2022.1 milestone Feb 9, 2022
@codecov
Copy link

codecov bot commented Feb 10, 2022

Codecov Report

Merging #1564 (a0b0a8e) into main (d69e1be) will increase coverage by 0.16%.
The diff coverage is n/a.

❗ Current head a0b0a8e differs from pull request most recent head e4f889b. Consider uploading reports for the commit e4f889b to get more accurate results

Impacted Files Coverage Δ
src/pymor/reductors/h2.py 100.00% <ø> (+22.53%) ⬆️
src/pymor/models/iosys.py 79.12% <0.00%> (+0.29%) ⬆️
src/pymor/bindings/dunegdt.py 76.85% <0.00%> (+2.47%) ⬆️
src/pymortests/mpi_run_demo_tests.py 85.10% <0.00%> (+4.25%) ⬆️

@pmli
Copy link
Member

pmli commented Feb 10, 2022

Looks good to me so far. Would you also add seed to adaptive_rrf?

Hopefully, this doesn't cause strange git conflicts with #1552...

@sdrave
Copy link
Member

sdrave commented Feb 10, 2022

We actually have the policy that seed parameters go along with random_state parameters as in ParameterSpace.sample_randomy. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the line

random_state = get_random_state(random_state, seed)

ensures that by default a global random state is used, which is seeded by a pyMOR default. Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

@artpelling
Copy link
Member Author

artpelling commented Feb 10, 2022

We actually have the policy that seed parameters go along with random_state parameters as in ParameterSpace.sample_randomy. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the line

random_state = get_random_state(random_state, seed)

ensures that by default a global random state is used, which is seeded by a pyMOR default.

I also came across this part of the code when getting to the bottom of the inner workings of VectorArray.random. I was a bit unsure how to go about it at the time. random_state = get_random_state(random_state, seed) gets called in VectorSpace.random so I assumed it would be correct.
I think I understand it a bit better now and I have pushed a new commit. Is this correct now?

What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the rrf methods and VectorSpace.random. I guess a solution would be to only have random_state as an argument and do the seeding manually before calling rrf...

Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

In the latest commit f05fcc6, calling rrf(A, seed=1) twice gives me identical results (that is what I want).

Another issue is that I am still struggling with the @defaults decorator. Should random_state and seed be included there as well?

@artpelling
Copy link
Member Author

artpelling commented Feb 10, 2022

Looks good to me so far. Would you also add seed to adaptive_rrf?

Yes.

Hopefully, this doesn't cause strange git conflicts with #1552...

I implemented this locally in order to test that PR and I think this is a handy feature. I don't think this should interfere functionally as the arguments are set to None per default which is what happened before anyway. Of course a rebase will be necessary. If #1552 gets to messy, I don't mind waiting with the merge and then rebasing this PR.

@artpelling artpelling changed the title add seed=None argument to rrf add random_state and seed arguments to rrf and adaptive_rrf Feb 10, 2022
@sdrave
Copy link
Member

sdrave commented Feb 11, 2022

What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the rrf methods and VectorSpace.random. I guess a solution would be to only have random_state as an argument and do the seeding manually before calling rrf...

I'm not sure if I understand what you mean. If you pass random_state to VectorSpace.random, as you do, this random state will be used and no additional seeding takes place (see pymor.tools.random.get_random_state).

Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

In the latest commit f05fcc6, calling rrf(A, seed=1) twice gives me identical results (that is what I want).

Yes, that should also be the case. However, if you call rrf twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.

Another issue is that I am still struggling with the @defaults decorator. Should random_state and seed be included there as well?

No, the idea is to have single seed for the entire program run which is a default for get_random_state. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, using get_random_state ensures that subsequent runs of the same script always yield the same result.

@artpelling
Copy link
Member Author

artpelling commented Feb 11, 2022

Yes, that should also be the case. However, if you call rrf twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.

No, the idea is to have single seed for the entire program run which is a default for get_random_state. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, using get_random_state ensures that subsequent runs of the same script always yield the same result.

Ok great, then I think I understood it correctly.

I'm not sure if I understand what you mean. If you pass random_state to VectorSpace.random, as you do, this random state will be used and no additional seeding takes place (see pymor.tools.random.get_random_state).

What I meant was that

assert random_state is None or seed is None
random_state = get_random_state(random_state, seed)

is now done in the rrf functions, as well as the VectorSpace.random method, which is a bit redundant. If for example, random_state and seed would then be also added as arguments to randomized SVD which calls rrf these two lines would also have to be in the randomized SVD function.

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

@sdrave
Copy link
Member

sdrave commented Feb 16, 2022

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

I quite like the idea of removing the seed parameter everywhere and replacing it by random_state (where it is missing). That would mean that the methods in lrradi, samdp, eigs would loose their seed defaults replacing them by a global default seed for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?

@pmli
Copy link
Member

pmli commented Feb 16, 2022

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

I quite like the idea of removing the seed parameter everywhere and replacing it by random_state (where it is missing). That would mean that the methods in lrradi, samdp, eigs would loose their seed defaults replacing them by a global default seed for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?

Up to randomization that is difficult or expensive to avoid (e.g., parallelization in BLAS), I think having as much determinism as possible is desirable (e.g., to have more-or-less reproducible code for publications). So, although it might seem unintuitive, I would be for methods, that use random numbers, to return the same result on consecutive calls. I was actually thinking that we should discuss JAX's approach (basically, making seed a required argument without a default value, and a way to generate new seeds) and whether it makes sense to do something similar in pyMOR.

@sdrave
Copy link
Member

sdrave commented Feb 17, 2022

Let's continue the big design discussion in #1577.

@pmli
Copy link
Member

pmli commented Oct 26, 2022

This can be closed since merging #1736, right?

@artpelling
Copy link
Member Author

Yes!

@artpelling artpelling closed this Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:fix Fixes a bug pr:new-feature Introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants