-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Refactor optimization methods to use scipy.stats.qmc #13469
Conversation
@tupui, what benefits are there to changing the LHS initialisation in differential evolution from what was there to the QMC implementation? As is, the code should be distributing the initial guess according to a Latin Hypersquare. The QMC won't make diffev work any faster, because it's a miniscule part of the total work. Moreover, we will also lose an amount of reproducibility against previous versions of scipy because the initialisation will be different for the same seed. |
It is roughly the same implementation behind, but doing so would allow in the future to easily switch to any QMC for initialisation. Or maybe I should already do this here? So add other options for the parameter |
@rgommers, @ev-br, do we have a scipy policy on strict back compatibility with functions that use random numbers? In this PR changing the implementation of the |
@tupui, if |
Over at scipy.stats, we do change random streams once in a while without any backcompat notices. |
I think we should document what the policy is. I'd be in favor of adopting the same policy as the new |
With the last commit I propose something toward adding other methods for initialization. If all this is OK, I will add tests and doc. Otherwise I can revert it OC. Just let me know what you think is best 😃 |
@tupui, after reading through the link that Ralf sent through I think the best thing to do here is to revert the change to the |
@andyfaff ok I reverted the original LHS code. I added some doc about Halton and Sobol'. To deal with the As for leaving LHS as the default for compatibility, makes sense. But what about deprecating or using a warning to advice the user to move for another method? |
@Stefan-Endres what do you thing about these changes? I tried to not touch at anything else than the number of sample for Sobol' which should be a power of 2. Actually I have to look more into this because when iterating we do not respect anymore the 2**m condition. |
Please advise me on what you think I should do here @andyfaff @rgommers |
@@ -12,10 +12,11 @@ | |||
from scipy.optimize._constraints import (Bounds, new_bounds_to_old, | |||
NonlinearConstraint, LinearConstraint) | |||
from scipy.sparse import issparse | |||
|
|||
from scipy.stats import qmc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes optimize
depend on stats
, which wasn't the case before. stats
did already depend on optimize
though. Apparently it doesn't create an import cycle, but it is slightly unfortunate. In other such situations, we've had to move the code that both submodules depend on to scipy/_lib/
.
Something to look at a little more closely before merging.
With these kinds of change, there's always a question about whether there is a noticeable difference in performance (in terms of accuracy, harder than speed)? We have accuracy benchmarks in It's probably fine, since the change is not to the default method and relatively straightforward. But not quite sure - @andyfaff what do you think? |
I can make some benchmark on my side to check the speed. What I am really concerned about is the way For Sobol', currently I am adding an exponential number of sample. This is not wanted I guess, so I will go ahed and use the scrambled version. This way we would be able to use a constant increase of number of point. Just have the 2**m constrain. Let me know @Stefan-Endres if this goes against theory. Using the scrambled version, and also fixing 2**m will induce some change on the user side. But right now the way it's used is not correct. |
I am getting terrible results with the test function from the docstrings using the scrambled version. This is very interesting... What I can do instead is to use the generator and fast forward it before any sampling. This way I can "emulate" randomization so that we can ask for a constant number of sample and keep the nice properties. But this is puzzling. If someone knows about what is happening here that could help. |
With the last commits, I fixed the theoretical and practical issues with Sobol'. To have independent samples from the unscrambled version, I am skipping Also, as a reminder, I changed the position of the update of the number of points as when specifying 64 points and 1 iteration, it was actually doing 64*2 points. This explains why sometimes I had to increase (mostly double) the number of points of some tests. |
@tupui thank you for this initiative I think it would be great to refactor the random sampling for the optimization package.
Unfortunately the typical use of
This is a known bug that resulted in an extra iteration. We have fixed this in the upstream repository and I will open a PR to
I think you are absolutely correct from a statistical perspective, from an optimisation perspective this is not practical in higher dimensions because of intermittent convergence criteria. I think that adding
This would be preferable. My question is how different would this be from the current Sobol sequence generated by SHGO? I have often get very different performance by using different seeds for the sequence.
I have this in my offline library and just need to clean the code, I will try to open a separate PR as soon as possible. |
Thank you for the clarifications @Stefan-Endres The main difference with the new implementation of Sobol' is that it is radically faster (cython), can be scrambled and also we checked the convergence. Still, I also noticed these differences in terms of results with the seed. I guess it's really a matter of the function and the "luck" we had with the bare unscrambled sequence. That's why I was asking if you had maybe some specific properties you needed from the base sequence. Like having the point (0, ..., 0). I am not sure I understand when you say that you can have a varying number of points. When setting a So from what you say, I could adapt the code to accept also Halton. And I will use then the scrambled version in both cases. I will just need to adapt the number of points for the test cases due to this seeding variation. As you wanted to work on this, shall I add you to my fork so that you could commit changes there? It might be easier if we both work on the same branch. What do you think? |
If possible sampling the centre of the hypercube (0.5, 0.5, ..., 0.5) usually produces better performance (especially when the optimisation bounds are infinite or unknown), but I can connect this point manually in
There is also an option (
If you do not specify any constraints and the objective function is well behaved then it will work like you expect with
Yes, I suspect that Halton would deliver better performance in general anyway, and ideally we can keep
Yeah, I think the best approach should be to implement Halton in this fork, then I will work on making |
Ok perfect then I will add Halton and we keep Sobol' for back compatibility. I will update the doc as well. I will do this early next week and hopefully we can move forward with this. |
@Stefan-Endres I have a first version if you would like to review. Thank you! I left Sobol' unscrambled for now so that the tests pass. I will let you assess this as you know about how this should behave. |
@tupui thank you for this, this looks perfect and everything is running. I will pull this into the upstream library as well. I look forward to posting the benchmarks using |
Sorry, I can't comment on the rest of this PR. I can only confirm that it addresses the concern in gh-13441. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through all the files except _shgo
. I'm not an expert on that file.
@tupui, in general it's good practice that PR's are merged by someone different to the person that created it. I don't believe it's a hard and fast rule, the exceptions are if the PR is fairly simple and straightforward. This PR is a bit more complicated. |
Thanks @andyfaff for the review. I addressed all the points. For |
@andyfaff Thanks again. I made sure the docstrings match between the class and the function for what I changed. |
@andyfaff if you have time, could you have another look at it? |
Friendly remainder @andyfaff. Thanks in advance. |
@rgommers if you have some time, could you have a look? It should be ready as it has been reviewed. |
@larsoner if you have some time, could you have a look? |
@mdhaber this one of mine is still pending. If you have some time to help ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larsoner if you have some time, could you have a look?
Not really in my domain (more signals/linalg) but given that @andyfaff looked, his comments seem to have been addressed, and from what I can tell it makes sense, I think we can merge this. Let's give folks who have been pinged a few more days -- @tupui can you ping me on Monday to merge this if I haven't already?
Thank you 😃 . Sure I will ping you if this is not merged by then. |
Friendly reminder @larsoner 😄 Thanks. |
Thanks @tupui ! |
Thanks Eric! @Stefan-Endres it's merged, you can now propose your enhancements! Feel free to ping me. |
This removes legacy QMC code in favour of
scipy.stats.qmc
functions introduced in #10844.LHS is present in
scipy.optimize._differentialevolution.py
and Sobol' is implemented in pure python and used inscipy.optimize._shgo.py
.Also, currently the doc for
shgo
is using an incorrect number of samples for Sobol' as not being a power of 2.