-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: deal with cases in minimize
for `(bounds.lb == bounds.ub).any()
#12889
Conversation
@@ -491,32 +491,43 @@ def _dense_difference(fun, x0, f0, h, use_one_sided, method): | |||
h_vecs = np.diag(h) | |||
|
|||
for i in range(h.size): | |||
df = np.zeros_like(f0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like df
will be all zeros if any of the dx
are zero. Don't we want the elements of df
corresponding with nonzero dx
to be evaluated as usual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dx
can either be scalar or (N,)
length array.
If all of dx
are zero then there don't need to be any evaluations of fun(x)
; df
should be 0 (or actually np.zeros_like(f0)
), because the step is 0. This has the effect of reducing overall function evaluations.
If there are any non-zero dx
then fun(x)
is evaluated to calculate df
as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<face-palm>
i was tired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the tests are hitting the array dx
case. I think you will get a:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How about np.any(dx)
, which will work for both scalar and array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future reference dx
is an array of size > 1 when evaluating a hessian.
The last commit checks that |
@mdhaber I think the failing tests are due to CI not liking the latest pytest. |
06dadf2
to
b5ac138
Compare
rebased to make tests run again. |
I'm afraid I can't check this weekend : / I'm swamped. Is in time for 1.6.0 OK? |
1.6.0 is fine, I was just trying to clear low hanging fruit. In all seriousness I would review your PR, but I'm not an expert in the subject matter. |
minimize
for `(bounds.lb == bounds.ub).any()
@mdhaber I was fiddling around with the idea of factorising out equal bounds (reducing dimensionality of the problem), but it's not only the objective function one has to worry about. One also have to deal with jacobians and hessians, and I'm not sure if it's possibly to do that in a clean way. |
User provided functions for those, right? Why can't we eliminate the rows and columns of those outputs that correspond with the fixed variables? |
Ahhh, filter afterwards rather than before. |
Yeah, there's nothing we can do to stop them from being evaluated, but we can remove them. I am imagining that I suppose it would be a pain to fix the separate fmin solvers. But... we could. Or we could leave them broken and recommend that new code use |
d35559e
to
01aaf8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor testing ideas -- +1 for merge with or without these suggestions
def update_grad(): | ||
self._update_fun() | ||
self.ngev += 1 | ||
self.g = approx_derivative(fun_wrapped, self.x, f0=self.f, | ||
**finite_diff_options) | ||
# when the lower and upper bounds are equal then set the | ||
# gradient to 0. | ||
self.g[equal_bounds] = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the PR: set to 0 derivatives of fixed parameters (which were Nans). I wish it is back-ported to the 1.5 series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that with the change in approx_derivative
, self.g[equal_bounds]
will be filled with zeros by approx_derivative
. Filling it explicitly is fine too.
Edit: most likely it will be filled with nans (in newer Python and numpy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more important question: could someone explain qualitatively why this change is fundamentally correct?
I don't want to crash into this PR all of the sudden and destroy all the work, but I have some concerns.
- From a logical/mathematical point of view, the fact that
lb[i] == ub[i]
doesn't impose any constraints on the gradient atlb[i]
. So it's very unclear what setting elements ofg
to zero achieves. - What is going to happen if we run the same problem now with analytical and numerical differentiation and some bounds being equal? In principle we must get close and correct results (including
x_opt[i] == lb[i] == ub[i]
), but I don't see how it is guaranteed with the current code. - It seems to me that the ability of a solver to handle
lb[i] == ub[i]
correctly with all possible settings requires more considerations and testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm less concerned about losing work for the PR, and more concerned that we get behaviour right here. It'd be great to have more input with people familiar with optimisation, to make sure that happens. @nmayorov I'd appreciate any further input you're able to give.
The changes in this PR are by no means perfect.
-
Logically setting
lb[i] == ub[i]
means that you know, with 100% certainty, thatx[i]
has to be a constant value. Should a function have a gradient forx[i]
if it's not allowed to take any other value? A graph of f vs x would show a singular dot, and not a line. In such a case the gradient should probably be undefined (nan), even for an 'analytic' derivative (or should it be zero?). -
An analytic jacobian function (derived for
lb[i] < ub[i]
) may be able to calculate the gradient atx[i] == lb[i]
withlb[i] == ub[i]
. However, estimating the gradient with finite differences at that point and respecting bounds is impossible. If bounds are ignored during numeric differentiation, then it is possible to do that (you're also saying that you're not 100% sure thatlb[i] == ub[i]
).
There were two sets of issues reported that have been involved in the lead to this PR:
- (a) estimating grad via finite differences should respect bounds. This lead to the use of
approx_derivative
that does so. - (b) when the bounds-respecting was done for the previous issue this lead to issues in numerical optimizers, such as
slsqp
orlbfgsb
that don't work when the jacobian containsnan
(df/dx=0.0/0.0
)
From testing SLSQP and LBFGSB do work when those nan are replaced by zeros, so that was the approach I took here. A different approach is to recognise problems with equal bounds and factor out those fixed parameters, thereby reducing the dimensionality of the problem. I think this factorisation would increase code complication and not straightforward to implement across all the solvers.
Yet another approach could be to get the actual iterative code in SLSQP and LBFGSB to behave properly for lb[i] == ub[i]
, I don't think anyone has the appetite for altering Fortran code.
I don't think that this PR, or an alternate that factored out fixed parameters, would lead to the analytic or numerical differentation approaches giving the same answer unless we rolled back the changes from 2a.
TLDR: even though it's not a great solution, it may be the best of a bad lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the stumbling blocks is the alteration of nan-->0 in _differentiable_functions
, then that could be done in a wrapper function at a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure, but conceptually it looks feasible using Python-side wrappers. The problem must be completely reformulated to remove fixed variable, then solved by the optimizer, then the result is restored to contain the fixed variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's vital to get such an ad-hoc fix? I mean if it fails with an error - it's an expected behavior and the error message is correct in this case. The case with equal bounds seems to be "undefined area" in the current state to be honest.
Well I'm not actively involved in the development as of late, so don't want to block anything. The second PR looks preferable to me.
As for _approx_derivative
- maybe it should even raise an error with equal bounds, because it's an incorrect setting considering the logic involved. But it won't play well with you changes, so it is just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reported issues (+ their volume) and the documentation for minimize indicate that setting equal bounds has historically been acceptable/valid.
However, the last release made the numerical differentiation obey any bounds, so back compatibility with this historical behaviour was broken (because the core of slsqp+lbfgsb can't cope with NaN in the gradient). Squash one bug, and another pops up.
Given that the alternate approach in #13069 might be preferable, would you have time to review that PR? It would be so, so, helpful. I agree that the change of changing Nan entries to zero in the gradient isn't 100% desirable. However, for the test example I constructed the modifications worked, they also work for various users posting these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the main reason to use this "feature" (bounds.ub == bounds.lb) to fix a parameter is because re-rewriting the function (and its derivative) is probably more complicated & error prone. This PR basically enforces the derivative of the function along fixed parameters to be zero instead of 0/0=Nan. Beside this, looks good to me. |
Test failures are unrelated. |
0f467cf
to
2f6f79b
Compare
@larsoner rebased to fix merge conflict. |
@larsoner ping for a merge? |
@nmayorov @larsoner, I've made an alternate version, andyfaff#21. In that version the replacement of nan-->0 occurs within a wrapper in |
milestone bumped, if this really is important for the 1.6.x series you may still have a tiny bit of time, but I'm trying to branch soon |
@mdhaber, @nmayorov, @larsoner on the whole I prefer this option rather than Matt's alternative in #13096, because it seems simpler and more maintainable. Merging this PR would recover the ability to have lower bounds equal to upper bounds for several solvers (a long standing issue for L-BFGS-B, SLSQP).
|
|
||
|
||
@pytest.mark.parametrize('method', ['Powell', 'L-BFGS-B', 'SLSQP', | ||
'trust-constr']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need more comprehensive tests, as this currently passes in master for Powell and trust-constr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember from the original issue if Powell and trust-constr were sensitive to this. I can only remember that lbfgsb and slsqp were.
This functionality was added in #13096 |
Reference issue
#12502
What does this implement/fix?
Additional information