ENH: interpolate: add RBFInterpolator #13595

treverhines · 2021-02-22T02:59:11Z

Reference issue

Addresses issues/requests mentioned in #9904 and #5180

What does this implement/fix?

Two new classes have been added: RBFInterpolator and KNearestRBFInterpolator.

RBFInterpolator is a replacement for Rbf. The main differences are 1) RBFInterpolator has usage that more closely follows other scattered data interpolation classes, 2) RBFInterpolator includes polynomial terms in the interpolant, and 3) the sign of the smoothing parameter and some RBFs are corrected, which addresses the erroneous smoothing behavior like this #4790 (comment).

KNearestRBFInterpolator performs RBF interpolation using only the k-nearest observations to each interpolation point. This class can be a faster and more memory efficient alternative to RBFInterpolator when there are many observations (>10,000).

Additional information

The new classes deviate from the Rbf class in a few other ways that are worth noting:

The default RBF is the thin plate spline, whereas Rbf defaults to the multiquadric. I prefer using the thin plate spline over the multiquadric as a default because 1) it is relatively well-known, 2) the multiquadric is infinitely smooth, which can cause the interpolant to have oscillations similar to Runge's phenomenon, and 3) the thin plate spline is invariant to the shape parameter epsilon, meaning it should work well without any tuning.
epsilon must be specified if the chosen RBF is not scale invariant (i.e., when kernel is not "linear", "tps", "cubic", or "quintic"). My preference is to require the user to pick a shape parameter rather than default to a heuristic like using the average nearest neighbor distance.
There is no norm argument for the new classes, and they only use euclidean norms. I made this choice because most of the literature on RBFs assume a euclidean norm, and I would prefer to limit the functionality of these classes to what is well understood.
epsilon scales the RBF input as r*epsilon rather than r/epsilon, which is consistent with RBF literature (for example: https://www.sciencedirect.com/science/article/pii/S0898122107002210)
The names of the RBFs differ from those used in Rbf. The new names are shorter and consistent with abbreviations used in the RBF literature (see the article above)

edit: corrected misspelling of multiquadric

for the Rbf class and a class for RBF interpolation with the k nearest neighbors.

ENH: The monomials are now also scaled by epsilon, which should make the interpolant more numerically stable when the domain is very large/small. STY: warnings and errors now have quotes around kernel names

ENH: Assert that interpolation points have the same number of dimensions as the observation points

ENH: inf can be given for `k` in `KNearestRBFInterpolator` to use all observations

… using RBFInterpolator with the k nearest observations

ENH: Do not warn about polynomial degrees less than -1 since all negative degrees behave the same DOC: Added more to the description of `epsilon`

DOC: added that kernel can be callable in the documentation

…caled differently to improve numerical stability

ENH: an error is raised in __init__ for KNearestRBFInterpolator if there are too few observations

…gth of `y`

writing the greek letter lambda

MAINT: created a function to sanitize the arguments for RBFInterpolator and KNearestRBFInterpolator

tylerjereddy

Certainly looks quite thorough with the testing side of things. I'm not a domain expert though so just added some superficial comments about the tests. Maybe best to wait for another reviewer before making changes in any case.

scipy/interpolate/tests/test_rbfinterp.py

stefanv · 2021-02-26T09:01:31Z

Thank you @treverhines, this is great! I am excited to try the nearest neighbor implementation.

Could you explain the chunking concept? Does this apply to higher dimensions as well, and how should these be chosen? Is that the type of guideline that should appear in the docstring?

DOC: Added description of default behavior DOC: Replaced "interpolation points" with "evaluation points" in a comment

rgommers · 2021-05-23T17:41:41Z

Is this ready from your point of view @treverhines?

treverhines · 2021-05-23T19:37:11Z

I think this branch is in a good state, and there is nothing more that I want to add.

It looks like the CI failed due to an issue checking the pythran version

   File "/home/runner/.local/lib/python3.10/site-packages/Cython/Compiler/Pythran.py", line 12, in <module>
    pythran_is_pre_0_9_6 = tuple(map(int, pythran.__version__.split('.')[0:3])) < (0, 9, 6)
ValueError: invalid literal for int() with base 10: '12dev'

rgommers · 2021-05-23T19:44:04Z

It looks like the CI failed due to an issue checking the pythran version

I already submitted a fix for that on the Pythran repo. 0.9.11 is released, so we can also go back to the non-dev version now.

rgommers · 2021-05-24T11:02:14Z

I was getting errors like these on running the benchmarks in this branch at first:

               For parameters: 50, 10, 'inverse_quadratic'
               malloc(): invalid size (unsorted)
               
               For parameters: 50, 10, 'gaussian'
               double free or corruption (out)
               
               For parameters: 50, 100, 'cubic'
               malloc(): unaligned tcache chunk detected
               
               For parameters: 50, 100, 'quintic'
               munmap_chunk(): invalid pointer

before realizing it was due to using a Pythran version without the necessary fix in it. Since that fix was only just released, this is going to happen to others unless we add in a guard - I'll push a fix for that.

rgommers

Went through it one more time, this all looks great. Let's get it in for 1.7.0:)

Thanks @treverhines, this is a very nice new feature! And thanks @stefanv and @tupui for reviewing.

Jwink3101 · 2021-05-24T18:21:56Z

This looks really exciting and while different than what I did (and abandoned) for #11212. Just skimming the source code, is there a reason some of the kernel evaluations are not vectorized?

Maybe I am missing it but can you specify a total power for the polynomial? So a 2 would be (in 2D) x + x**2 + y + y**2 + xy?

Final comment that I have raised before: there are analytical ways to get a leave-one-out error from the Gram matrix. They are super useful and a shame that other tools do not compute it (including things like Gaussian Processes in scikit-learn). But that may be a different discussion

treverhines · 2021-05-24T23:47:16Z

Just skimming the source code, is there a reason some of the kernel evaluations are not vectorized?

The kernel functions are not vectorized because AFAIK they cannot be both vectorized and operate on the distance matrix in-place (pythran does not support having an out argument for the numpy ufuncs). I wanted to build the matrix being solved in-place because memory requirements are a limitation for RBFInterpolation. Keep in mind that _rbfinterp_pythran.py is compiled with pythran, so those scalar-valued kernel functions are much faster than if they were ran as python functions.

Maybe I am missing it but can you specify a total power for the polynomial? So a 2 would be (in 2D) x + x2 + y + y2 + xy

If you specify degree=2 and the input is 2-dimensional, the polynomial would consist of the monomial basis functions {1, x, y, xy, x**2, y**2}. I think you have the same behavior in your branch.

Final comment that I have raised before: there are analytical ways to get a leave-one-out error from the Gram matrix. They are super useful and a shame that other tools do not compute it (including things like Gaussian Processes in scikit-learn). But that may be a different discussion

I have thought a bit about adding generalized cross validation (and also generalized maximum likelihood) as a static method to this class. So the usage would look something like

optimal_smoothing = minimize_scalar(lambda s: RBFInterpolator.gcv(x, y, smoothing=s)).x
interp = RBFInterpolator(x,y, smoothing=optimal_smoothing)

I think this would be useful, but I figured it would be best to save it for another PR.

stefanv · 2021-05-25T00:26:39Z

@treverhines Thank you for this fantastic contribution. This is a major step forward for RBFs in SciPy! I enjoyed reviewing your work, and loved to see how this PR transformed during the review process.

tupui · 2021-05-25T06:32:54Z

Thanks @treverhines for this great PR!

I have thought a bit about adding generalized cross validation (and also generalized maximum likelihood) as a static method to this class. So the usage would look something like
optimal_smoothing = minimize_scalar(lambda s: RBFInterpolator.gcv(x, y, smoothing=s)).x
interp = RBFInterpolator(x,y, smoothing=optimal_smoothing)
I think this would be useful, but I figured it would be best to save it for another PR.

Would be a good follow up PR. I am using LOOCV to resample and having a function to get this would be good.

Jwink3101 · 2021-05-25T12:21:10Z

Just to be clear, the LOOCV I was referring to do not need to recompute anything or loop. It is based on the inverse of the gram matrix directly.

Either way, it would be pretty cool to have

Jwink3101 · 2021-05-25T15:15:34Z

Just to further elaborate (now that I am at a computer), the formula I am talking about is 17.1 in G. E. Fasshauer. Meshfree Approximation Methods with MATLAB, volume 6 of Interdisciplinary Math- ematical Sciences. World Scientific, 2007 where they reference Rippa. An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Advances in Computational Mathematics, 11(2-3):193–210, 1999.

treverhines · 2021-05-26T13:07:31Z

@Jwink3101 thanks for pointing that out! The algorithm from Rippa 1999 seems to be complementary to generalized cross validation. Both are O(N^3) methods for computing LOOCV error, but Rippa 1999 seems to be for the case when there is no smoothing (correct me if I am wrong here), and GCV can only be used with a non-zero smoothing parameter.

Jwink3101 · 2021-05-26T15:26:53Z

@treverhines

I would almost certainly need help with the mathematics but I think it can be shown that this works with smoothing parameters too.

To be fair, I actually originally thought of all of this with respect to Gaussian Process Regression (aka Kriging) which, at least for the mean function is the same mathematics for some positive definite kernels (including the polynomial though that is usually done through the argument of Best Linear Unbiased Predictor). Also, kriging usually has anisotropic kernels but RBFs could too.

As such, another reference that covers this nicely is as follows (specifically Section 2, eq 19-22):

J. D. Martin and T. W. Simpson. Use of Kriging Models to Approximate Deterministic Computer Models. AIAA journal, 43(4):853–863, 2005.

Anyway, the operation is on the Gram matrix, (well, it's inverse) so it shouldn't be affected (other than conditioning) by the inclusion of the ~~error~~ smoothing term which only appears on the diagonal.

Now, whether this can be proven for conditionally positive definite kernels (aka splines), I don't know but they don't mention that restriction in Fasshauer 2007.

A quick look (so I may have missed it) shows that Rippa and Fasshauer do not talk about the polynomial augmenting the RBF but Martin 2005 (above) does include it in the non-numbered equation block below (19).

So that is all to say, I think there are references to say it is sound. And I can tell you through brute force checking, it gets it right to ~0.1% (as an aside, polynomial least-squares fitting have the same type of calculation often called the PRESS and brute-force checking that is to machine precision).

And of course, brute-force confirmation does not a proof make. But those papers serve as references.

Both are O(N^3) methods for computing LOOCV error

This is true but depending on your implementation, you can reuse some work depending on how you solve the original equations.

In my personal implementation of this (I don't recall if it is in my PR), I attempt to solve the systems with a Cholesky Decomposition. If you store that, you can use that to compute the inverse. That only works for positive-definite kernels so for things like the TPS, you will need to rebuild and invert the matrix (though I suspect someone better at linear algebra than me could figure out an efficient way to do it. I just don't want to directly compute the inverse for solving the original equations and use scipy.linalg.solve instead)

But my implementation was not designed with memory footprint as well as yours and instead prioritized vector operations (since I didn't do it with any kind optimization of Pythran). So you may have to redo the inversions. Personally, the value in have the LOOCV is so great that I don't mind, though I don't compute it unless asked.

I hope this helps. Sorry I couldn't be more rigorous.

treverhines · 2021-05-28T01:55:40Z

@Jwink3101 I spent some time playing with the Rippa 1999 formula for LOOCV, and I can attest that it is accurately giving me the LOOCV error regardless of whether I use non-zero smoothing, CPD kernels, or augmenting polynomials. Thanks again for pointing this out.

I also want to correct something I said above: GCV is motivated by LOOCV, but it is not equivalent to LOOCV. So the equation from Rippa 1999 and GCV are two distinct objective functions.

lbgr0 · 2021-08-20T07:21:29Z

@treverhines really appreciate your work here!
Any reasons, why you did not include the option to calculate the derivatives directly when using the __call() function, as you did in the https://github.com/treverhines/RBF ?

Do you reccomend still using the code from your other project, or should I swith to the scipy implementation? How to get the derivatives here?

Thanks!

treverhines · 2021-08-21T13:27:00Z

Any reasons, why you did not include the option to calculate the derivatives directly when using the __call() function, as you did in the https://github.com/treverhines/RBF ?

In order to compute derivatives of the interpolant, you need a function to compute the derivatives of the kernel used for the interpolant. I use sympy in my RBF package to symbolically differentiate the kernels and then compile the symbolic expressions into numerical function. It would be difficult to use sympy for this scipy implementation of RBF interpolation because 1) sympy is not currently a dependency of scipy and 2) we have taken the route of optimizing the implementation with pythran, which cannot use the functions generated by sympy. So if we want the ability to differentiate the interpolant, we would need to explicitly code up some reasonable number of kernel derivatives in pythran (say all first and second order derivatives for each kernel). It is doable, but more work than I am willing to commit to right now.

Do you reccomend still using the code from your other project, or should I swith to the scipy implementation? How to get the derivatives here?

I would recommend continuing to use my RBF package if you want to compute analytical derivatives of the interpolant.

treverhines added 26 commits January 28, 2021 20:13

ENH: Added the module rbfinterp.py. The module contains a replacement

6cbf17d

for the Rbf class and a class for RBF interpolation with the k nearest neighbors.

DOC: reworked docs to improve clarity

3fa6f68

ENH: reworded a warning message to improve clarity

fb652f9

DOC: clarified docs and warning messages

980d6b8

ENH: epsilon now must be specified if ther RBF is not scale invariant.

274b4c0

ENH: The monomials are now also scaled by epsilon, which should make the interpolant more numerically stable when the domain is very large/small. STY: warnings and errors now have quotes around kernel names

ENH: Clarify the error message when there are too few observations

5101cfa

ENH: Assert that interpolation points have the same number of dimensions as the observation points

TST: added test module for rbfinterp

cad7e9a

ENH: inf can be given for `k` in `KNearestRBFInterpolator` to use all observations

TST: added test to verify that KNearestRBFInterpolator is the same as…

31a38c4

… using RBFInterpolator with the k nearest observations

ENH: added support for complex data

e15c5c7

TST: added pickleable test

3c7750e

ENH: Allow callable values for kernel

7dd084a

ENH: Do not warn about polynomial degrees less than -1 since all negative degrees behave the same DOC: Added more to the description of `epsilon`

TST: added test for callable kernel

dedf507

DOC: added that kernel can be callable in the documentation

DOC: Added comment for work to do on addressing ill-conditioned LHS

6dca646

DOC: removed TODO comment

87d0080

ENH: The input to the kernel and the input to the monomials are now s…

3743cb9

…caled differently to improve numerical stability

TST: added tests for errors and warnings

2397eee

ENH: an error is raised in __init__ for KNearestRBFInterpolator if there are too few observations

ENH: d can now have any shape as long as its length matches the len…

cf2b540

…gth of `y`

DOC: put notes on the shape parameter in the notes section

1fd8292

DOC: fixed typo

1b3938b

DOC: added "Examples" and "See Also" section. Use literal backslash when

b1e8641

writing the greek letter lambda

DOC: clarified description of the shape parameter in the notes section

b69bd09

DOC: added benchmarks for RBFInterpolator and KNearestRBFInterpolator

6327e18

DOC: example for RBFInterpolator shows default behavior

7dcad4f

STY: fixed white spaces for pep8

e653219

DOC: removed whitespaces between parameter descriptions

9262cf1

MAINT: created a function to sanitize the arguments for RBFInterpolator and KNearestRBFInterpolator

STY: removed extra whitespaces

4073404

tylerjereddy added enhancement A new feature or improvement scipy.interpolate labels Feb 22, 2021

tylerjereddy reviewed Feb 22, 2021

View reviewed changes

scipy/interpolate/tests/test_rbfinterp.py Outdated Show resolved Hide resolved

scipy/interpolate/tests/test_rbfinterp.py Outdated Show resolved Hide resolved

treverhines added 4 commits May 23, 2021 11:45

DOC: Revised description of neighbors

31c75ae

DOC: Added description of default behavior DOC: Replaced "interpolation points" with "evaluation points" in a comment

DOC: OOP plotting in example

f52c41b

DOC: Replaced ticks with quotes

4106c3c

DOC: Removed unneeded colon

76c5dce

BLD: update Pythran requirement to >=0.9.11, and do version check

2b78881

treverhines requested a review from larsoner as a code owner May 24, 2021 12:54

rgommers changed the title ~~ENH: added RBFInterpolator and KNearestRBFInterpolator~~ ENH: interpolate: add RBFInterpolator May 24, 2021

rgommers approved these changes May 24, 2021

View reviewed changes

rgommers merged commit aec8dec into scipy:master May 24, 2021

This was referenced May 24, 2021

Rbf interpolation - use only K nearest neighbors #5180

Closed

RBF fixup programme #4790

Closed

Improved Radial Basis Function Interpolation #11212

Closed

Request/Proposal: Greatly improve scipy.interpolate.Rbf #9904

Closed

czgdp1807 mentioned this pull request Jun 15, 2021

RBFInterpolator gives Segmentation fault for large images #14234

Closed

rgommers mentioned this pull request Nov 4, 2022

ENH: Custom kernel for RBFInterpolator #17294

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: interpolate: add RBFInterpolator #13595

ENH: interpolate: add RBFInterpolator #13595

treverhines commented Feb 22, 2021 •

edited

tylerjereddy left a comment

stefanv commented Feb 26, 2021

rgommers commented May 23, 2021

treverhines commented May 23, 2021

rgommers commented May 23, 2021

rgommers commented May 24, 2021

rgommers left a comment

Jwink3101 commented May 24, 2021

treverhines commented May 24, 2021 •

edited

stefanv commented May 25, 2021

tupui commented May 25, 2021

Jwink3101 commented May 25, 2021

Jwink3101 commented May 25, 2021

treverhines commented May 26, 2021

Jwink3101 commented May 26, 2021 •

edited

treverhines commented May 28, 2021

lbgr0 commented Aug 20, 2021

treverhines commented Aug 21, 2021

ENH: interpolate: add RBFInterpolator #13595

ENH: interpolate: add RBFInterpolator #13595

Conversation

treverhines commented Feb 22, 2021 • edited

Reference issue

What does this implement/fix?

Additional information

tylerjereddy left a comment

Choose a reason for hiding this comment

stefanv commented Feb 26, 2021

rgommers commented May 23, 2021

treverhines commented May 23, 2021

rgommers commented May 23, 2021

rgommers commented May 24, 2021

rgommers left a comment

Choose a reason for hiding this comment

Jwink3101 commented May 24, 2021

treverhines commented May 24, 2021 • edited

stefanv commented May 25, 2021

tupui commented May 25, 2021

Jwink3101 commented May 25, 2021

Jwink3101 commented May 25, 2021

treverhines commented May 26, 2021

Jwink3101 commented May 26, 2021 • edited

treverhines commented May 28, 2021

lbgr0 commented Aug 20, 2021

treverhines commented Aug 21, 2021

treverhines commented Feb 22, 2021 •

edited

treverhines commented May 24, 2021 •

edited

Jwink3101 commented May 26, 2021 •

edited