-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: interpolate: add RBFInterpolator #13595
ENH: interpolate: add RBFInterpolator #13595
Conversation
for the Rbf class and a class for RBF interpolation with the k nearest neighbors.
ENH: The monomials are now also scaled by epsilon, which should make the interpolant more numerically stable when the domain is very large/small. STY: warnings and errors now have quotes around kernel names
ENH: Assert that interpolation points have the same number of dimensions as the observation points
ENH: inf can be given for `k` in `KNearestRBFInterpolator` to use all observations
… using RBFInterpolator with the k nearest observations
ENH: Do not warn about polynomial degrees less than -1 since all negative degrees behave the same DOC: Added more to the description of `epsilon`
DOC: added that kernel can be callable in the documentation
…caled differently to improve numerical stability
ENH: an error is raised in __init__ for KNearestRBFInterpolator if there are too few observations
writing the greek letter lambda
MAINT: created a function to sanitize the arguments for RBFInterpolator and KNearestRBFInterpolator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly looks quite thorough with the testing side of things. I'm not a domain expert though so just added some superficial comments about the tests. Maybe best to wait for another reviewer before making changes in any case.
Thank you @treverhines, this is great! I am excited to try the nearest neighbor implementation. Could you explain the chunking concept? Does this apply to higher dimensions as well, and how should these be chosen? Is that the type of guideline that should appear in the docstring? |
DOC: Added description of default behavior DOC: Replaced "interpolation points" with "evaluation points" in a comment
Is this ready from your point of view @treverhines? |
I think this branch is in a good state, and there is nothing more that I want to add. It looks like the CI failed due to an issue checking the pythran version
|
I already submitted a fix for that on the Pythran repo. |
I was getting errors like these on running the benchmarks in this branch at first:
before realizing it was due to using a Pythran version without the necessary fix in it. Since that fix was only just released, this is going to happen to others unless we add in a guard - I'll push a fix for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through it one more time, this all looks great. Let's get it in for 1.7.0:)
Thanks @treverhines, this is a very nice new feature! And thanks @stefanv and @tupui for reviewing.
This looks really exciting and while different than what I did (and abandoned) for #11212. Just skimming the source code, is there a reason some of the kernel evaluations are not vectorized? Maybe I am missing it but can you specify a total power for the polynomial? So a 2 would be (in 2D) Final comment that I have raised before: there are analytical ways to get a leave-one-out error from the Gram matrix. They are super useful and a shame that other tools do not compute it (including things like Gaussian Processes in scikit-learn). But that may be a different discussion |
The kernel functions are not vectorized because AFAIK they cannot be both vectorized and operate on the distance matrix in-place (pythran does not support having an
If you specify
I have thought a bit about adding generalized cross validation (and also generalized maximum likelihood) as a static method to this class. So the usage would look something like optimal_smoothing = minimize_scalar(lambda s: RBFInterpolator.gcv(x, y, smoothing=s)).x
interp = RBFInterpolator(x,y, smoothing=optimal_smoothing) I think this would be useful, but I figured it would be best to save it for another PR. |
@treverhines Thank you for this fantastic contribution. This is a major step forward for RBFs in SciPy! I enjoyed reviewing your work, and loved to see how this PR transformed during the review process. |
Thanks @treverhines for this great PR!
Would be a good follow up PR. I am using LOOCV to resample and having a function to get this would be good. |
Just to be clear, the LOOCV I was referring to do not need to recompute anything or loop. It is based on the inverse of the gram matrix directly. Either way, it would be pretty cool to have |
Just to further elaborate (now that I am at a computer), the formula I am talking about is 17.1 in G. E. Fasshauer. Meshfree Approximation Methods with MATLAB, volume 6 of Interdisciplinary Math- ematical Sciences. World Scientific, 2007 where they reference Rippa. An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Advances in Computational Mathematics, 11(2-3):193–210, 1999. |
@Jwink3101 thanks for pointing that out! The algorithm from Rippa 1999 seems to be complementary to generalized cross validation. Both are O(N^3) methods for computing LOOCV error, but Rippa 1999 seems to be for the case when there is no smoothing (correct me if I am wrong here), and GCV can only be used with a non-zero smoothing parameter. |
I would almost certainly need help with the mathematics but I think it can be shown that this works with smoothing parameters too. To be fair, I actually originally thought of all of this with respect to Gaussian Process Regression (aka Kriging) which, at least for the mean function is the same mathematics for some positive definite kernels (including the polynomial though that is usually done through the argument of Best Linear Unbiased Predictor). Also, kriging usually has anisotropic kernels but RBFs could too. As such, another reference that covers this nicely is as follows (specifically Section 2, eq 19-22):
Anyway, the operation is on the Gram matrix, (well, it's inverse) so it shouldn't be affected (other than conditioning) by the inclusion of the Now, whether this can be proven for conditionally positive definite kernels (aka splines), I don't know but they don't mention that restriction in Fasshauer 2007. A quick look (so I may have missed it) shows that Rippa and Fasshauer do not talk about the polynomial augmenting the RBF but Martin 2005 (above) does include it in the non-numbered equation block below (19). So that is all to say, I think there are references to say it is sound. And I can tell you through brute force checking, it gets it right to ~0.1% (as an aside, polynomial least-squares fitting have the same type of calculation often called the PRESS and brute-force checking that is to machine precision). And of course, brute-force confirmation does not a proof make. But those papers serve as references.
This is true but depending on your implementation, you can reuse some work depending on how you solve the original equations. In my personal implementation of this (I don't recall if it is in my PR), I attempt to solve the systems with a Cholesky Decomposition. If you store that, you can use that to compute the inverse. That only works for positive-definite kernels so for things like the TPS, you will need to rebuild and invert the matrix (though I suspect someone better at linear algebra than me could figure out an efficient way to do it. I just don't want to directly compute the inverse for solving the original equations and use But my implementation was not designed with memory footprint as well as yours and instead prioritized vector operations (since I didn't do it with any kind optimization of Pythran). So you may have to redo the inversions. Personally, the value in have the LOOCV is so great that I don't mind, though I don't compute it unless asked. I hope this helps. Sorry I couldn't be more rigorous. |
@Jwink3101 I spent some time playing with the Rippa 1999 formula for LOOCV, and I can attest that it is accurately giving me the LOOCV error regardless of whether I use non-zero smoothing, CPD kernels, or augmenting polynomials. Thanks again for pointing this out. I also want to correct something I said above: GCV is motivated by LOOCV, but it is not equivalent to LOOCV. So the equation from Rippa 1999 and GCV are two distinct objective functions. |
@treverhines really appreciate your work here! Do you reccomend still using the code from your other project, or should I swith to the scipy implementation? How to get the derivatives here? Thanks! |
In order to compute derivatives of the interpolant, you need a function to compute the derivatives of the kernel used for the interpolant. I use sympy in my RBF package to symbolically differentiate the kernels and then compile the symbolic expressions into numerical function. It would be difficult to use sympy for this scipy implementation of RBF interpolation because 1) sympy is not currently a dependency of scipy and 2) we have taken the route of optimizing the implementation with pythran, which cannot use the functions generated by sympy. So if we want the ability to differentiate the interpolant, we would need to explicitly code up some reasonable number of kernel derivatives in pythran (say all first and second order derivatives for each kernel). It is doable, but more work than I am willing to commit to right now.
I would recommend continuing to use my RBF package if you want to compute analytical derivatives of the interpolant. |
Reference issue
Addresses issues/requests mentioned in #9904 and #5180
What does this implement/fix?
Two new classes have been added:
RBFInterpolator
andKNearestRBFInterpolator
.RBFInterpolator
is a replacement forRbf
. The main differences are 1)RBFInterpolator
has usage that more closely follows other scattered data interpolation classes, 2)RBFInterpolator
includes polynomial terms in the interpolant, and 3) the sign of the smoothing parameter and some RBFs are corrected, which addresses the erroneous smoothing behavior like this #4790 (comment).KNearestRBFInterpolator
performs RBF interpolation using only the k-nearest observations to each interpolation point. This class can be a faster and more memory efficient alternative toRBFInterpolator
when there are many observations (>10,000).Additional information
The new classes deviate from the
Rbf
class in a few other ways that are worth noting:The default RBF is the thin plate spline, whereas
Rbf
defaults to the multiquadric. I prefer using the thin plate spline over the multiquadric as a default because 1) it is relatively well-known, 2) the multiquadric is infinitely smooth, which can cause the interpolant to have oscillations similar to Runge's phenomenon, and 3) the thin plate spline is invariant to the shape parameterepsilon
, meaning it should work well without any tuning.epsilon
must be specified if the chosen RBF is not scale invariant (i.e., whenkernel
is not "linear", "tps", "cubic", or "quintic"). My preference is to require the user to pick a shape parameter rather than default to a heuristic like using the average nearest neighbor distance.There is no
norm
argument for the new classes, and they only use euclidean norms. I made this choice because most of the literature on RBFs assume a euclidean norm, and I would prefer to limit the functionality of these classes to what is well understood.epsilon
scales the RBF input asr*epsilon
rather thanr/epsilon
, which is consistent with RBF literature (for example: https://www.sciencedirect.com/science/article/pii/S0898122107002210)The names of the RBFs differ from those used in
Rbf
. The new names are shorter and consistent with abbreviations used in the RBF literature (see the article above)edit: corrected misspelling of multiquadric