Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] ENH: New least-squares algorithms #5044

Merged
merged 69 commits into from
Sep 21, 2015

Conversation

nmayorov
Copy link
Contributor

Hi! This is an extension of my previous PR #5019.

I moved some commits to the previous PR, now the first new commit is 2710986

It adds special adjustments to algorithms which allow to solve large problems, when Jacobian matrix is sparse (only matrix-vector products are computed). Also sparse Jacobians can be estimated fast by finite differencing, when each row contains only few non-zero elements.

UPDATE:

I think that's about it what I wanted to implement in least_squares function. I would appreciate more active feedback and code review.

Here are two examples of usage:

  1. http://nbviewer.ipython.org/gist/nmayorov/6098a514cc3277d72dd7 - sparse features.
  2. http://nbviewer.ipython.org/gist/nmayorov/dac97f3ed9d638043191 - robust loss functions.

I see that conceptually an example with interesting bounds usage is missing. Your suggestions in this directions are most welcome.

@ev-br ev-br added enhancement A new feature or improvement scipy.optimize labels Jul 13, 2015
@ev-br ev-br mentioned this pull request Jul 13, 2015
@nmayorov nmayorov force-pushed the bounded_lsq_sparse branch 2 times, most recently from 4ca0edb to 7f4e942 Compare July 19, 2015 01:18
J = J.toarray()
elif not issparse(J):
J = np.atleast_2d(J)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this checks sparsity and warns on each iteration. Better move the checks, warnings and construction of an appropriate Jacobian out of jac_wrapped. Same for the else branch below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning will appear only once. I thought about it and this approach seemed as the most clear for me. That's basically the reason to introduce jac_wrapped to handle all trivia in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Jacobian is know to be dense, why checking if issparse(J) repeatedly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it does no harm and takes the least amount of code (most readable) :). But, as I suggested above, let's just remove these conversions and checks all together (if a user wants to convert to dense - it's trivial to do so in his code).

n = x0.size

if diff_step is None:
epsfcn = np.finfo(float).eps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The call to finfo(float) is repeated at least four times, twice at module level. Better do it once, store the result, and import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, better import it from ._lsq_common.

if jac in ['2-point', '3-point']:
if jac_sparsity is not None:
if method == 'lm':
warn("`jac_sparsity` is ignored for method='lm', dense "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation? To avoid heavy computations? I think you might be right, better to stop here.

On the other hand there are a lot of similar places: converting sparse Jacobian to dense is also looking for troubles. Maybe leave it to a user to choose correct options and only warn if something happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LM and sparsity do not play together, this can never be a useful thing.
Generally, with this many mysterious switches I think it makes sense to weed out obviously wrong things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right! Let's also disallow sparse->dense conversion, i. e. be more strict and explicit. Agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested to know your opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

return groups


def group_sparse(int m, int n, int [:] indices, int [:] indptr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using int for indices overflow-safe here? I'd unlikely someone is using this large matrices, but still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially might be not, but np.int32 is the type for sparse matrices indices and indptr, so it can be changed only on much more fundamental level.

I think this link is again relevant https://github.com/scikit-learn/scikit-learn/wiki/C-integer-types:-the-missing-manual

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be careful about this. As of Scipy 0.14, sparse matrices use int32 unless the number of nonzero elements exceeds 2^31, at which point int64 is used. So this is probably fine given the application here, but it is possible to get 64 bits ints as indices from a sparse matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both indices and indptr are affected by that? Should I then cast the arrays to int64?

On the other hand, arrays with more than 2^31 elements are already too big for the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clean solution would be to use Cython's fused types here, but it looks like slicing a huge matrix to obtain a small one brings indices.dtype back to int32, so small matrices can be expected not to have int64 indices.

@ev-br
Copy link
Member

ev-br commented Jul 22, 2015

Two random comments:

  • a link to https://nmayorov.wordpress.com could be added to some comments or a module docstring :-)
  • least_squares.py should be renamed to _least_squares.py

@nmayorov
Copy link
Contributor Author

I've done some refactoring of Jacobian validation code. I think it's better now.

@ev-br
Copy link
Member

ev-br commented Jul 23, 2015

Running python runtests -s optimize --coverage shows several untested branches: the quartic equation in solve_trust_region_2d, constrained Cauchy step in dogbox, and a couple of corner cases in trf's handling of the reflected step. While 100% coverage is not a goal, it's better to avoid leaving large holes in at least smoke testing.

@ev-br
Copy link
Member

ev-br commented Jul 23, 2015

Re validation/wrapping of Jacobian: I still prefer to only do this validation once: nmayorov#1


def solve_lsq_trust_region(n, m, uf, s, V, Delta, initial_alpha=None,
rtol=0.01, max_iter=10):
"""Solve a trust-region problem arising in least-squares minimization by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary lines should be kept < 80 characters total. So for this, maybe something like

"""Use MINPACK method to solve a trust-region least squares problem

blah, blah
"""

Or some such, depending on what the function actually does, which is not clear; "problem" could refer to almost anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does "< 80 characters total" come from? I can find several methods in scipy which have multiline "Short summary" and it looks really well on the site. And this functions are not public actually.

It is said "trust-region problem", which is quite well known and specific in optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is inherited from the Python docstring convention https://www.python.org/dev/peps/pep-0257/, in combination with the pep 8 line length limitation.

It is hard to write short summaries, but the multiline ones are definitely a style violation. Travis doesn't check that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should respect PEP Style recommendations (maybe not always completely). I will try to figure out one-liners.

@nmayorov
Copy link
Contributor Author

I want to rename scaling='jac' to scaling='auto'. @ev-br like/dislike?

About the last example: maybe keep it? Yes doctring is long, but you don't need to read it all (if you don't want to). On the other hand, we have now examples covering all features. Matlab docpages are very long too.

My general plan:

  1. Merge this PR.
  2. Rebase/finish linear LSQ PR.
  3. Work on tutorial / examples.
  4. Finish curve_fit RP.

@ev-br
Copy link
Member

ev-br commented Sep 14, 2015

Your plan sounds good :-).

On scaling='jac' or 'auto' I am ambivalent. 'jac' sounds a bit more specific, but that's not something I'm going to have a strong opinion about.

Re last example: it does feel a little out of place. It reads a little more narrative --- which is good!, but more appropriate for a tutorial. So maybe move it over there or #5233. (we need to advertise tutorials more!)

One little knot to tie is #5044 (comment). It's likely not a serious issue, but it'd be good to not leave holes where reasonable. I've to admit I've no idea. Maybe @ewmoore or @larsmans can weigh in on this?

@nmayorov
Copy link
Contributor Author

About huge sparse matrices: I just couldn't test in on my machine, as I said, it is unlikely to create troubles (time / our of ram issues will come first).

Updated: I meant I couldn't test in on my machine. Sorry about that.

@@ -0,0 +1,7 @@
"""Module contains last-squares algorithms."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: last-squares. Also "This module".

@ev-br
Copy link
Member

ev-br commented Sep 16, 2015

About huge sparse matrices: I just couldn't test in on my machine, as I said, it is unlikely to create troubles > (time / our of ram issues will come first).

Yeah, it's quite hard to believe it'll cause much trouble. FWIW, I'd punt on this.

So, it seems the only thing which holds this PR is a docstring example :-)

@ev-br ev-br added this to the 0.17.0 milestone Sep 16, 2015
@ev-br
Copy link
Member

ev-br commented Sep 21, 2015

OK, I keep maintaining that the last docstring example should be moved to the tutorial, but this can be done in a follow-up PR for the item 3 of #5044 (comment).

@ev-br
Copy link
Member

ev-br commented Sep 21, 2015

Time to merge this, I'd think. (Further review is always welcome of course).

Thank you Nikolay, it's a great one.

ev-br added a commit that referenced this pull request Sep 21, 2015
[GSoC] ENH: New least-squares algorithms
@ev-br ev-br merged commit 1982080 into scipy:master Sep 21, 2015
@nmayorov
Copy link
Contributor Author

Sorry Evgeni, that I was so slow for making final changes. But great to see it merged!

@charris
Copy link
Member

charris commented Sep 21, 2015

Yay!

@nmayorov nmayorov deleted the bounded_lsq_sparse branch October 4, 2015 01:35
@ev-br
Copy link
Member

ev-br commented Oct 17, 2015

Now that curve-fit PR is in, the plan #5044 (comment) is done save item 3 "docs/tutorial" :-).

@nmayorov
Copy link
Contributor Author

Hi, Evgeni! Can you explain to me what is the current status of using IPython Notebooks as examples in scipy documentation?

@ev-br
Copy link
Member

ev-br commented Oct 26, 2015

Nice to see you back Nikolay :-). The discussion is going on in #5233: AFAIU, the only last thing to agree upon is whether to host them in a separate repository or add them to the main scipy repo.

I think the best way to get the ball rolling is to send a WIP PR with your tweaks to the tutorial and links to your notebooks with examples.

fun : callable
Function which computes the vector of residuals with the signature
``fun(x, *args, **kwargs)``, i.e., the minimization proceeds with
respect to it's first argument. The argument ``x`` passed to this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "it's" -> "its"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Antony --- would you care to send a PR with the fix please
09.01.2016 11:57 пользователь "Antony Lee" notifications@github.com
написал:

In scipy/optimize/_lsq/least_squares.py
#5044 (comment):

  • We call f(x) as a vector of residuals or simply residuals, and F(x) as a
  • cost function or simply cost.
  • We call rho(s) as a loss function, its purpose to reduce the influence
  • of outliers on the solution.
  • Partial derivatives of f with respect to x form m-by-n matrix called
  • Jacobian, where an element (i, j) equals the partial derivative of f[i]
  • with respect to x[j].
  • Parameters

  • fun : callable
  •    Function which computes the vector of residuals with the signature
    
  •    `fun(x, *args, **kwargs)`, i.e., the minimization proceeds with
    
  •    respect to it's first argument. The argument `x` passed to this
    

typo: "it's" -> "its"


Reply to this email directly or view it on GitHub
https://github.com/scipy/scipy/pull/5044/files#r49262246.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any corrections to the text would be helpful. Likely there are more mistakes and generally bad usage of English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet