Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Refactor optimization methods to use scipy.stats.qmc #13469

Merged
merged 26 commits into from
Apr 19, 2021

Conversation

tupui
Copy link
Member

@tupui tupui commented Jan 31, 2021

This removes legacy QMC code in favour of scipy.stats.qmc functions introduced in #10844.

LHS is present in scipy.optimize._differentialevolution.py and Sobol' is implemented in pure python and used in scipy.optimize._shgo.py.

Also, currently the doc for shgo is using an incorrect number of samples for Sobol' as not being a power of 2.

@andyfaff
Copy link
Contributor

@tupui, what benefits are there to changing the LHS initialisation in differential evolution from what was there to the QMC implementation? As is, the code should be distributing the initial guess according to a Latin Hypersquare. The QMC won't make diffev work any faster, because it's a miniscule part of the total work. Moreover, we will also lose an amount of reproducibility against previous versions of scipy because the initialisation will be different for the same seed.

@tupui
Copy link
Member Author

tupui commented Feb 1, 2021

@tupui, what benefits are there to changing the LHS initialisation in differential evolution from what was there to the QMC implementation? As is, the code should be distributing the initial guess according to a Latin Hypersquare. The QMC won't make diffev work any faster, because it's a miniscule part of the total work. Moreover, we will also lose an amount of reproducibility against previous versions of scipy because the initialisation will be different for the same seed.

It is roughly the same implementation behind, but doing so would allow in the future to easily switch to any QMC for initialisation. Or maybe I should already do this here? So add other options for the parameter init. Going that direction, I would also suggest to default to Sobol' which has better properties than LHS which can be really bad.

@andyfaff
Copy link
Contributor

andyfaff commented Feb 1, 2021

@rgommers, @ev-br, do we have a scipy policy on strict back compatibility with functions that use random numbers? In this PR changing the implementation of the latinhypersquare initialisation will mean that a differential_evolution minimisation done with different scipy versions will have different outputs for the same seed. There's no bug with the existing implementation, which would've justified the changes in this PR.

@andyfaff
Copy link
Contributor

andyfaff commented Feb 1, 2021

@tupui, if sobol would provide a well distributed set of initial population guesses, then it might be nice to have a sobol option for init. But I wouldn't change it to make it the default option to start with, for the same reasons I mentioned in my previous comment.

@ev-br
Copy link
Member

ev-br commented Feb 2, 2021

Over at scipy.stats, we do change random streams once in a while without any backcompat notices.
I've no informed opinion how severe this change can be in this regard.

@rgommers
Copy link
Member

rgommers commented Feb 2, 2021

I think we should document what the policy is. I'd be in favor of adopting the same policy as the new numpy.random one (for details, see NEP 19): we try hard to not change results for a seed, but there's no guarantee for exact reproducibility between different SciPy versions - we reserve the right to make changes to fix bugs, and for sigficant performance or quality of random variates improvements.

@tupui
Copy link
Member Author

tupui commented Feb 2, 2021

With the last commit I propose something toward adding other methods for initialization. If all this is OK, I will add tests and doc. Otherwise I can revert it OC. Just let me know what you think is best 😃

@andyfaff
Copy link
Contributor

andyfaff commented Feb 3, 2021

@tupui, after reading through the link that Ralf sent through I think the best thing to do here is to revert the change to the differential_evolution``init='latinhypersquare' keyword. The existing behaviour does not have bugs, it probably doesn't give significant benefit to make the changes in this PR, and it degrades reproducibility by giving different output from the same seed. That is, the code path for init='latinhypersquare' should give the same output before and after this PR.
However, a 'sobol' and 'halton' initialisation option would be welcomed. These should both be documented in the docstrings for differential_evolution and DifferentialEvolutionSolver. Note, the total population size is normally calculated as popsize * len(x); this would conflict with the 2**m behaviour for 'sobol'. Some thought should go into choosing the correct behaviour here. Should it be the closest power of 2? Note, the population should have at least 5 members, m=2 is too small`.

@tupui
Copy link
Member Author

tupui commented Feb 3, 2021

@andyfaff ok I reverted the original LHS code. I added some doc about Halton and Sobol'. To deal with the ns=2**m I am converting the population to the closest power of 2 (ceil and taking into account the min of 5).

As for leaving LHS as the default for compatibility, makes sense. But what about deprecating or using a warning to advice the user to move for another method?

@tupui
Copy link
Member Author

tupui commented Feb 4, 2021

@Stefan-Endres what do you thing about these changes? I tried to not touch at anything else than the number of sample for Sobol' which should be a power of 2.

Actually I have to look more into this because when iterating we do not respect anymore the 2**m condition.

@tupui tupui added maintenance Items related to regular maintenance tasks scipy.optimize labels Feb 6, 2021
@tupui
Copy link
Member Author

tupui commented Feb 10, 2021

  • I am still fighting around with the number of samples. I noticed something odd in iterate_delaunay: is it normal that self.nc += self.n was before the sampling? To be it should be after otherwise for the first iteration we are sampling using more points. @Stefan-Endres, maybe you can help here?

  • With the last commits, I update what I could in terms of number of samples for Sobol'. So at a new iteration, it is not using self.nc but it checks the internal state of the Sobol' generator to ask for the next 2**m points (I still need to change self.n so that we have a correct self.n, self.nc but sounds hackish). It is not making a lot of sense to use Sobol' and sample something else than 2**m. If we must do this, then I would suggest we use Halton instead.

  • Another possibility would be to use the scrambled version of Sobol' (which in any case we should: I am just keeping the base sequence here in case there is any theoretical reason I am not aware of) so that we don't add points exponentially.

Please advise me on what you think I should do here @andyfaff @rgommers

scipy/optimize/_shgo.py Outdated Show resolved Hide resolved
@@ -12,10 +12,11 @@
from scipy.optimize._constraints import (Bounds, new_bounds_to_old,
NonlinearConstraint, LinearConstraint)
from scipy.sparse import issparse

from scipy.stats import qmc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes optimize depend on stats, which wasn't the case before. stats did already depend on optimize though. Apparently it doesn't create an import cycle, but it is slightly unfortunate. In other such situations, we've had to move the code that both submodules depend on to scipy/_lib/.

Something to look at a little more closely before merging.

@rgommers
Copy link
Member

With these kinds of change, there's always a question about whether there is a noticeable difference in performance (in terms of accuracy, harder than speed)? We have accuracy benchmarks in benchmarks/benchmarks/optimize.py for global optimizers, but unfortunately shgo isn't included there.

It's probably fine, since the change is not to the default method and relatively straightforward. But not quite sure - @andyfaff what do you think?

@tupui
Copy link
Member Author

tupui commented Feb 12, 2021

With these kinds of change, there's always a question about whether there is a noticeable difference in performance (in terms of accuracy, harder than speed)? We have accuracy benchmarks in benchmarks/benchmarks/optimize.py for global optimizers, but unfortunately shgo isn't included there.

It's probably fine, since the change is not to the default method and relatively straightforward. But not quite sure - @andyfaff what do you think?

I can make some benchmark on my side to check the speed.

What I am really concerned about is the way sobol is used. Currently it is used as a normal random generator. If we want minimal changes, using Halton instead would actually be a good option.

For Sobol', currently I am adding an exponential number of sample. This is not wanted I guess, so I will go ahed and use the scrambled version. This way we would be able to use a constant increase of number of point. Just have the 2**m constrain. Let me know @Stefan-Endres if this goes against theory.

Using the scrambled version, and also fixing 2**m will induce some change on the user side. But right now the way it's used is not correct.

@tupui
Copy link
Member Author

tupui commented Feb 12, 2021

I am getting terrible results with the test function from the docstrings using the scrambled version. This is very interesting...

What I can do instead is to use the generator and fast forward it before any sampling. This way I can "emulate" randomization so that we can ask for a constant number of sample and keep the nice properties.

But this is puzzling. If someone knows about what is happening here that could help.

@tupui
Copy link
Member Author

tupui commented Feb 13, 2021

With the last commits, I fixed the theoretical and practical issues with Sobol'.
So now we have a constant increase of points while having similar performance and respecting the theory behind the method.

To have independent samples from the unscrambled version, I am skipping self.iters_done * n (with n=2**m) between every sampling. [EDIT: ] just realising this is a particular case here as n is constant we don't need to reset the engine and fast-forward it manually. This is implicitly done at each sampling.

Also, as a reminder, I changed the position of the update of the number of points as when specifying 64 points and 1 iteration, it was actually doing 64*2 points. This explains why sometimes I had to increase (mostly double) the number of points of some tests.

@Stefan-Endres
Copy link
Contributor

Stefan-Endres commented Feb 13, 2021

@tupui thank you for this initiative I think it would be great to refactor the random sampling for the optimization package.

Actually I have to look more into this because when iterating we do not respect anymore the 2**m condition.

Unfortunately the typical use of SHGO is to repeat iterations (with fixed or variable sampling points generated per iteration) until a satisfactory solution is found. Therefore the 2**m recommendation will rarely be satisfied (and is usually not practical in higher dimensions). The other problem is that SHGO should also be able to handle non-linear constraints, so even though the sampling points are generated in a (stretched) hypercube, often the majority of these points will be discarded so in this case many of the low-discrepancy guarantees are void anyway.

I am still fighting around with the number of samples. I noticed something odd in iterate_delaunay: is it normal that self.nc += self.n was before the sampling? To be it should be after otherwise for the first iteration we are sampling using more points. @Stefan-Endres, maybe you can help here?

This is a known bug that resulted in an extra iteration. We have fixed this in the upstream repository and I will open a PR to scipy very soon (all that's left is to update the documentation). In the latest version we have streamlined the code a lot so that the sampling points is counted in the same way that the simplicial method counts (and therefore you do not need to make any changes, because any sampling method will work correctly in the new version).

It is not making a lot of sense to use Sobol' and sample something else than 2**m. If we must do this, then I would suggest we use Halton instead.

I think you are absolutely correct from a statistical perspective, from an optimisation perspective this is not practical in higher dimensions because of intermittent convergence criteria. I think that adding Halton above Sobol is a good idea. Any low discrepancy sampling method should work. My idea was to support any sequence that can be generated since some use cases have particular needs.

Another possibility would be to use the scrambled version of Sobol' (which in any case we should: I am just keeping the base sequence here in case there is any theoretical reason I am not aware of) so that we don't add points exponentially.

This would be preferable. My question is how different would this be from the current Sobol sequence generated by SHGO? I have often get very different performance by using different seeds for the sequence.

@rgommers

With these kinds of change, there's always a question about whether there is a noticeable difference in performance (in terms of accuracy, harder than speed)? We have accuracy benchmarks in benchmarks/benchmarks/optimize.py for global optimizers, but unfortunately shgo isn't included there.

I have this in my offline library and just need to clean the code, I will try to open a separate PR as soon as possible.

@tupui
Copy link
Member Author

tupui commented Feb 13, 2021

Thank you for the clarifications @Stefan-Endres

The main difference with the new implementation of Sobol' is that it is radically faster (cython), can be scrambled and also we checked the convergence. Still, I also noticed these differences in terms of results with the seed. I guess it's really a matter of the function and the "luck" we had with the bare unscrambled sequence. That's why I was asking if you had maybe some specific properties you needed from the base sequence. Like having the point (0, ..., 0).

I am not sure I understand when you say that you can have a varying number of points. When setting a n I could see it was using it all the time. The way I understand the algorithm is that it uses some batches of points. And at least for the generation of batches, we have a coherent sampling. Having some good coverage here would seems to be wanted, meaning we could benefit from QMC

So from what you say, I could adapt the code to accept also Halton. And I will use then the scrambled version in both cases. I will just need to adapt the number of points for the test cases due to this seeding variation.

As you wanted to work on this, shall I add you to my fork so that you could commit changes there? It might be easier if we both work on the same branch. What do you think?

@Stefan-Endres
Copy link
Contributor

The main difference with the new implementation of Sobol' is that it is radically faster (cython), can be scrambled and also we checked the convergence. Still, I also noticed these differences in terms of results with the seed. I guess it's really a matter of the function and the "luck" we had with the bare unscrambled sequence. That's why I was asking if you had maybe some specific properties you needed from the base sequence. Like having the point (0, ..., 0).

If possible sampling the centre of the hypercube (0.5, 0.5, ..., 0.5) usually produces better performance (especially when the optimisation bounds are infinite or unknown), but I can connect this point manually in shgo later as long as the unittests are passing. I think in general there won't be any major change in benchmark performance. As long as the unittests are passing it would be ideal to merge this code with scipy before the next PR for shgo. Since we fixed the issue with an extra iteration run, the overall performance of the algorithm should be increased anyway if we coordinate to release this in the same scipy version.

I am not sure I understand when you say that you can have a varying number of points.

There is also an option (options={`infty_constraints':False}) with Sobol sampling to keep generating sampling points until a feasible number of points is found. For example, if the algorithm generated n=64 points in a new iteration and 10 points are infeasible then shgo will generate another 10 points from QMC and this loop repeats until there are 64 feasible points. This is vital in cases where the user needs to keep memory usage low by not adding these points to the triangulation.

When setting a n I could see it was using it all the time. The way I understand the algorithm is that it uses some batches of points. And at least for the generation of batches, we have a coherent sampling. Having some good coverage here would seems to be wanted, meaning we could benefit from QMC.

If you do not specify any constraints and the objective function is well behaved then it will work like you expect with n = 2**m generated with every call to QMC. The default behaviour of shgo is to not discard infeasible points and then the calls to QMC should again always be n = 2**m.

So from what you say, I could adapt the code to accept also Halton. And I will use then the scrambled version in both cases. I will just need to adapt the number of points for the test cases due to this seeding variation.

Yes, I suspect that Halton would deliver better performance in general anyway, and ideally we can keep sobol for back compatibility together with the recommendation to use 2**m sampling points in the documentation. Since sobol is not the default sampling method anyway we would not need to make any API changes.

As you wanted to work on this, shall I add you to my fork so that you could commit changes there? It might be easier if we both work on the same branch. What do you think?

Yeah, I think the best approach should be to implement Halton in this fork, then I will work on making sobol back compatible to ensure all the tests pass. It will be a bit of a merge conflict with the newer shgo code, I think it will unfortunately take a while to review that PR, so it will be best to keep the new routine separate and deal with that merge later after this one has been merged.

@tupui
Copy link
Member Author

tupui commented Feb 13, 2021

Ok perfect then I will add Halton and we keep Sobol' for back compatibility. I will update the doc as well. I will do this early next week and hopefully we can move forward with this.

@tupui
Copy link
Member Author

tupui commented Feb 15, 2021

@Stefan-Endres I have a first version if you would like to review. Thank you! I left Sobol' unscrambled for now so that the tests pass. I will let you assess this as you know about how this should behave.

@Stefan-Endres
Copy link
Contributor

@tupui thank you for this, this looks perfect and everything is running. I will pull this into the upstream library as well. I look forward to posting the benchmarks using halton sampling.

@tupui tupui requested a review from andyfaff February 15, 2021 21:07
@tupui
Copy link
Member Author

tupui commented Mar 23, 2021

@rgommers @mdhaber @andyfaff If you are OK and there are no more comments, I suggest I merge this before the WE.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 23, 2021

Sorry, I can't comment on the rest of this PR. I can only confirm that it addresses the concern in gh-13441.

Copy link
Contributor

@andyfaff andyfaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through all the files except _shgo. I'm not an expert on that file.

scipy/optimize/_differentialevolution.py Show resolved Hide resolved
scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_shgo.py Show resolved Hide resolved
scipy/optimize/_shgo.py Show resolved Hide resolved
scipy/optimize/_shgo.py Outdated Show resolved Hide resolved
scipy/optimize/tests/test__shgo.py Show resolved Hide resolved
scipy/optimize/tests/test__shgo.py Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
@andyfaff
Copy link
Contributor

@tupui, in general it's good practice that PR's are merged by someone different to the person that created it.

I don't believe it's a hard and fast rule, the exceptions are if the PR is fairly simple and straightforward. This PR is a bit more complicated.

@tupui
Copy link
Member Author

tupui commented Mar 24, 2021

I've gone through all the files except _shgo. I'm not an expert on that file.

Thanks @andyfaff for the review. I addressed all the points. For shgo itself, @Stefan-Endres approved earlier in the discussion. Apparently, he has some other improvements that he plan to do in a follow up PR.

scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_differentialevolution.py Outdated Show resolved Hide resolved
scipy/optimize/_shgo.py Outdated Show resolved Hide resolved
scipy/optimize/_shgo.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member Author

tupui commented Mar 24, 2021

@andyfaff Thanks again. I made sure the docstrings match between the class and the function for what I changed.

@tupui
Copy link
Member Author

tupui commented Mar 29, 2021

@andyfaff if you have time, could you have another look at it?

@tupui
Copy link
Member Author

tupui commented Apr 2, 2021

Friendly remainder @andyfaff. Thanks in advance.

@tupui tupui mentioned this pull request Apr 6, 2021
11 tasks
@tupui
Copy link
Member Author

tupui commented Apr 7, 2021

@rgommers if you have some time, could you have a look? It should be ready as it has been reviewed.

@tupui tupui requested a review from ev-br April 12, 2021 13:22
@tupui
Copy link
Member Author

tupui commented Apr 13, 2021

@larsoner if you have some time, could you have a look?

@tupui
Copy link
Member Author

tupui commented Apr 14, 2021

@mdhaber this one of mine is still pending. If you have some time to help ;)

@mdhaber
Copy link
Contributor

mdhaber commented Apr 15, 2021

@tupui these days (until the summer probably) I would prefer to stick to stats. How about I take a look at gh-13471, instead?

@tupui
Copy link
Member Author

tupui commented Apr 15, 2021

@tupui these days (until the summer probably) I would prefer to stick to stats. How about I take a look at gh-13471, instead?

Sure thanks @mdhaber!

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner if you have some time, could you have a look?

Not really in my domain (more signals/linalg) but given that @andyfaff looked, his comments seem to have been addressed, and from what I can tell it makes sense, I think we can merge this. Let's give folks who have been pinged a few more days -- @tupui can you ping me on Monday to merge this if I haven't already?

@tupui
Copy link
Member Author

tupui commented Apr 15, 2021

@larsoner if you have some time, could you have a look?

Not really in my domain (more signals/linalg) but given that @andyfaff looked, his comments seem to have been addressed, and from what I can tell it makes sense, I think we can merge this. Let's give folks who have been pinged a few more days -- @tupui can you ping me on Monday to merge this if I haven't already?

Thank you 😃 . Sure I will ping you if this is not merged by then.

@tupui
Copy link
Member Author

tupui commented Apr 19, 2021

Friendly reminder @larsoner 😄 Thanks.

@larsoner larsoner merged commit 71b4bc0 into scipy:master Apr 19, 2021
@larsoner
Copy link
Member

Thanks @tupui !

@tupui tupui deleted the qmc_optim branch April 19, 2021 12:36
@tupui
Copy link
Member Author

tupui commented Apr 19, 2021

Thanks Eric!

@Stefan-Endres it's merged, you can now propose your enhancements! Feel free to ping me.

@tylerjereddy tylerjereddy added this to the 1.7.0 milestone Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants