Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: deal with cases in minimize for `(bounds.lb == bounds.ub).any() #12889

Closed
wants to merge 11 commits into from

Conversation

andyfaff
Copy link
Contributor

@andyfaff andyfaff commented Sep 26, 2020

Reference issue

#12502

What does this implement/fix?

Additional information

@@ -491,32 +491,43 @@ def _dense_difference(fun, x0, f0, h, use_one_sided, method):
h_vecs = np.diag(h)

for i in range(h.size):
df = np.zeros_like(f0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like df will be all zeros if any of the dx are zero. Don't we want the elements of df corresponding with nonzero dx to be evaluated as usual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dx can either be scalar or (N,) length array.

If all of dx are zero then there don't need to be any evaluations of fun(x); df should be 0 (or actually np.zeros_like(f0)), because the step is 0. This has the effect of reducing overall function evaluations.

If there are any non-zero dx then fun(x) is evaluated to calculate df as usual.

Copy link
Contributor

@mdhaber mdhaber Sep 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<face-palm> i was tired.

Copy link
Contributor

@mdhaber mdhaber Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the tests are hitting the array dx case. I think you will get a:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

How about np.any(dx), which will work for both scalar and array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future reference dx is an array of size > 1 when evaluating a hessian.

@andyfaff
Copy link
Contributor Author

The last commit checks that ScalarFunction rejects functions that don't return scalars.

@andyfaff
Copy link
Contributor Author

@mdhaber I think the failing tests are due to CI not liking the latest pytest.

@andyfaff
Copy link
Contributor Author

rebased to make tests run again.
@mdhaber, ping for merge?

@mdhaber
Copy link
Contributor

mdhaber commented Oct 11, 2020

I'm afraid I can't check this weekend : / I'm swamped. Is in time for 1.6.0 OK?
Tell you what, I'll trade you for taking a look at gh-12043 : P (kidding)

@andyfaff
Copy link
Contributor Author

I'm afraid I can't check this weekend : / I'm swamped. Is in time for 1.6.0 OK?
Tell you what, I'll trade you for taking a look at gh-12043 : P (kidding)

1.6.0 is fine, I was just trying to clear low hanging fruit. In all seriousness I would review your PR, but I'm not an expert in the subject matter.

@andyfaff andyfaff changed the title Equal bounds MAINT: deal with cases in minimize for `(bounds.lb == bounds.ub).any() Oct 20, 2020
@andyfaff
Copy link
Contributor Author

@mdhaber I was fiddling around with the idea of factorising out equal bounds (reducing dimensionality of the problem), but it's not only the objective function one has to worry about. One also have to deal with jacobians and hessians, and I'm not sure if it's possibly to do that in a clean way.

@mdhaber
Copy link
Contributor

mdhaber commented Oct 21, 2020

User provided functions for those, right? Why can't we eliminate the rows and columns of those outputs that correspond with the fixed variables?

@andyfaff
Copy link
Contributor Author

Ahhh, filter afterwards rather than before.

@mdhaber
Copy link
Contributor

mdhaber commented Oct 21, 2020

Yeah, there's nothing we can do to stop them from being evaluated, but we can remove them.

I am imagining that minimize identifies equal bounds, removes those variables from the decision variable vector before calling the underlying solver. To the solver it passes not the user provided functions but wrappers that inject the fixed variables into the input, call the user provided function, and remove any derivatives corresponding with the fixed variables from the output.

I suppose it would be a pain to fix the separate fmin solvers. But... we could. Or we could leave them broken and recommend that new code use minimize.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor testing ideas -- +1 for merge with or without these suggestions

scipy/optimize/tests/test_optimize.py Show resolved Hide resolved
scipy/optimize/tests/test_differentiable_functions.py Outdated Show resolved Hide resolved
def update_grad():
self._update_fun()
self.ngev += 1
self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options)
# when the lower and upper bounds are equal then set the
# gradient to 0.
self.g[equal_bounds] = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the PR: set to 0 derivatives of fixed parameters (which were Nans). I wish it is back-ported to the 1.5 series.

Copy link
Contributor

@nmayorov nmayorov Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that with the change in approx_derivative, self.g[equal_bounds] will be filled with zeros by approx_derivative. Filling it explicitly is fine too.

Edit: most likely it will be filled with nans (in newer Python and numpy).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more important question: could someone explain qualitatively why this change is fundamentally correct?

I don't want to crash into this PR all of the sudden and destroy all the work, but I have some concerns.

  1. From a logical/mathematical point of view, the fact that lb[i] == ub[i] doesn't impose any constraints on the gradient at lb[i]. So it's very unclear what setting elements of g to zero achieves.
  2. What is going to happen if we run the same problem now with analytical and numerical differentiation and some bounds being equal? In principle we must get close and correct results (including x_opt[i] == lb[i] == ub[i]), but I don't see how it is guaranteed with the current code.
  3. It seems to me that the ability of a solver to handle lb[i] == ub[i] correctly with all possible settings requires more considerations and testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less concerned about losing work for the PR, and more concerned that we get behaviour right here. It'd be great to have more input with people familiar with optimisation, to make sure that happens. @nmayorov I'd appreciate any further input you're able to give.

The changes in this PR are by no means perfect.

  1. Logically setting lb[i] == ub[i] means that you know, with 100% certainty, that x[i] has to be a constant value. Should a function have a gradient for x[i] if it's not allowed to take any other value? A graph of f vs x would show a singular dot, and not a line. In such a case the gradient should probably be undefined (nan), even for an 'analytic' derivative (or should it be zero?).

  2. An analytic jacobian function (derived for lb[i] < ub[i]) may be able to calculate the gradient at x[i] == lb[i] with lb[i] == ub[i]. However, estimating the gradient with finite differences at that point and respecting bounds is impossible. If bounds are ignored during numeric differentiation, then it is possible to do that (you're also saying that you're not 100% sure that lb[i] == ub[i]).
    There were two sets of issues reported that have been involved in the lead to this PR:

  • (a) estimating grad via finite differences should respect bounds. This lead to the use of approx_derivative that does so.
  • (b) when the bounds-respecting was done for the previous issue this lead to issues in numerical optimizers, such as slsqp or lbfgsb that don't work when the jacobian contains nan (df/dx=0.0/0.0)

From testing SLSQP and LBFGSB do work when those nan are replaced by zeros, so that was the approach I took here. A different approach is to recognise problems with equal bounds and factor out those fixed parameters, thereby reducing the dimensionality of the problem. I think this factorisation would increase code complication and not straightforward to implement across all the solvers.
Yet another approach could be to get the actual iterative code in SLSQP and LBFGSB to behave properly for lb[i] == ub[i], I don't think anyone has the appetite for altering Fortran code.

I don't think that this PR, or an alternate that factored out fixed parameters, would lead to the analytic or numerical differentation approaches giving the same answer unless we rolled back the changes from 2a.

TLDR: even though it's not a great solution, it may be the best of a bad lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one of the stumbling blocks is the alteration of nan-->0 in _differentiable_functions, then that could be done in a wrapper function at a higher level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but conceptually it looks feasible using Python-side wrappers. The problem must be completely reformulated to remove fixed variable, then solved by the optimizer, then the result is restored to contain the fixed variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmayorov, the elimination solution is likely to take some time, and I think it's vital to get at least a temporary fix in for the next release. At the moment this is either going to be this PR, or #13069. The alternate PR lifts the issue from the numdiff code up into the solver method code.

Copy link
Contributor

@nmayorov nmayorov Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's vital to get such an ad-hoc fix? I mean if it fails with an error - it's an expected behavior and the error message is correct in this case. The case with equal bounds seems to be "undefined area" in the current state to be honest.

Well I'm not actively involved in the development as of late, so don't want to block anything. The second PR looks preferable to me.

As for _approx_derivative - maybe it should even raise an error with equal bounds, because it's an incorrect setting considering the logic involved. But it won't play well with you changes, so it is just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported issues (+ their volume) and the documentation for minimize indicate that setting equal bounds has historically been acceptable/valid.
However, the last release made the numerical differentiation obey any bounds, so back compatibility with this historical behaviour was broken (because the core of slsqp+lbfgsb can't cope with NaN in the gradient). Squash one bug, and another pops up.
Given that the alternate approach in #13069 might be preferable, would you have time to review that PR? It would be so, so, helpful. I agree that the change of changing Nan entries to zero in the gradient isn't 100% desirable. However, for the test example I constructed the modifications worked, they also work for various users posting these issues.

Copy link
Contributor

@mdhaber mdhaber Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyfaff @nmayorov do you see problems with the approach in gh-13096 (different from gh-13069)? Even if it's not ready for the next release, is that what we'd like to head towards?

@kif
Copy link
Contributor

kif commented Nov 2, 2020

For me, the main reason to use this "feature" (bounds.ub == bounds.lb) to fix a parameter is because re-rewriting the function (and its derivative) is probably more complicated & error prone.

This PR basically enforces the derivative of the function along fixed parameters to be zero instead of 0/0=Nan.

Beside this, looks good to me.

@andyfaff
Copy link
Contributor Author

andyfaff commented Nov 3, 2020

Test failures are unrelated.

@andyfaff
Copy link
Contributor Author

andyfaff commented Nov 3, 2020

@larsoner rebased to fix merge conflict.

@andyfaff
Copy link
Contributor Author

andyfaff commented Nov 8, 2020

@larsoner ping for a merge?

scipy/optimize/_numdiff.py Outdated Show resolved Hide resolved
@andyfaff
Copy link
Contributor Author

@nmayorov @larsoner, I've made an alternate version, andyfaff#21. In that version the replacement of nan-->0 occurs within a wrapper in _minimize_*, and not in approx_derivative. @nmayorov, would this be preferred to this PR?

@tylerjereddy
Copy link
Contributor

milestone bumped, if this really is important for the 1.6.x series you may still have a tiny bit of time, but I'm trying to branch soon

@tylerjereddy tylerjereddy modified the milestones: 1.7.0, 1.8.0 May 25, 2021
@andyfaff
Copy link
Contributor Author

@mdhaber, @nmayorov, @larsoner on the whole I prefer this option rather than Matt's alternative in #13096, because it seems simpler and more maintainable.

Merging this PR would recover the ability to have lower bounds equal to upper bounds for several solvers (a long standing issue for L-BFGS-B, SLSQP).
Remaining issues for discussion:

  • the behaviour for analytical and finite-difference gradients would be different. This PR sets the gradient entries for equal bounds to 0. Corresponding entries for analytic gradients would be different. For me the a gradient of zero for a fixed parameter seems logical. When using these kinds of minimisers I'd certainly expect an inverse Hessian (c.f. covariance matrix) to have zero entries for those parameters, because they're fixed.

@andyfaff andyfaff marked this pull request as ready for review July 30, 2021 04:37


@pytest.mark.parametrize('method', ['Powell', 'L-BFGS-B', 'SLSQP',
'trust-constr'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need more comprehensive tests, as this currently passes in master for Powell and trust-constr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember from the original issue if Powell and trust-constr were sensitive to this. I can only remember that lbfgsb and slsqp were.

@andyfaff andyfaff closed this Oct 15, 2021
@andyfaff
Copy link
Contributor Author

This functionality was added in #13096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants