Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692
ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692
Changes from 7 commits
684c342
8e65498
d79d994
e1a6db4
9070503
fc3bbf6
035845c
6f2af40
04e1111
80fcd9a
b3bce60
594266c
c6012d3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's enough going on here that there should be some comments to explain things. It took me a bit to figure out what's happening.
Just to make I'm on the same page, here are the details as I understand them:
If we know$L^{-1}$ we can instead use a triangular solver to $ZL^{-1} = X$ . It turns out we can calculate $L^{-1}$ with a Cholesky transform, but not of $\Gamma$ , but instead a permuted version of $\Gamma$ .
If$RR^{\top} = \Gamma$ then $(R^{-1})^{\top}R^{-1} = \Gamma^{-1}$ . This isn't quite a Cholesky decomposition though, since $(R^{-1})^{\top}$ is upper triangular instead of lower triangular. Let $J$ be the antidiagonal matrix with all ones on the antidiagonal and zeros elsewhere. Multiplying on the left by $J$ permutes rows and multiplying on the right by $J$ permutes columns. If we instead calculate the Cholesky decomposition $RR^{\top} = J\Gamma J$ , then
$$(R^{-1})^{\top}R^{-1} = J^{-1}\Gamma^{-1}J^{-1} = J\Gamma^{-1}J$$
$$\Gamma^{-1} = J(R^{-1})^\top R^{-1}J = (JR^{-1}J)^\top(JR^{-1}J)$$
and thus
where we've used that$J = J^{-1}$ and $J = J^{\top}$ .
This means that if we know the Cholesky factor$R$ of $J\Gamma J$ , the Cholesky factor of $\Gamma^{-1}$ is
$J(R^{-1})^{\top}J$ . (If $A$ is upper triangular then $JAJ$ is lower triangular). This means $L^{-1} = JR^{\top}J$ , and we can write our equation as $ZJR^{\top}J = X$ and solve for $Z$ . This is exactly what you've done, but it's still not intuitive for me yet, just algebraic.
I think we should have a comment explaining the types of equations we want to be able to solve
$Z = XL$ , where $L = \operatorname{Chol}\left(\Gamma^{-1}\right)$ . Equivalently $ZL^{-1} = X$ . Also a brief sentence explaining that it's possible to calculate $L^{-1}$ directly from the Cholesky decompostion of the permuted matrix that reverses both the rows and columns of $X$ . Also some kind of citation to a place to find more details. The best explanation I've found is in a Mathoverflow answer by the eminent mathematician Robert Israel. Perhaps just a link to this answer would be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely does deserve an explanation, and I meant to include one before you got to this. Sorry to make you find it on your own. Yes, I think that is the original post I followed. I'll write a bit about the motivation and link to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. The explanation is very clear now. I think everything is in good shape now but still want to double check carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow the whole argument but if applies here, use the
lower=False
keyword for starting with an upper triangular in callingcholesky
. Might save a column swap or two.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good thought but isn't quite what we want. According to the documentation for the underlying Lapack function, if$LL^{\top}$ is the Cholesky factorization of $\Gamma$ , then
$L^{\top}$ . I've tried it to be sure. What we would actually need is to find an upper triangular matrix $U$ such that $\Gamma = UU^{\top}$ . The
cholesky(Gamma, lower=True)
will returnL
andcholesky(Gamma, lower=False)
will returncholesky
function can't do this so we're required to do the trick with reversing the rows and columns.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but I guess it will save us one transpose when computing the Cholesky decomposition of$J\Gamma J$ since we need the upper triangular factor there. It may be worthwhile for small matrices but will most likely have a negligible impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we recompute
self._data_covariance
here? Does it help to maintain backwards compatibility for existing subclasses? If this is needed, could probably use a comment to explain why but it’s fine if you think it isn’t necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I assumed that since
_compute_covariance
used to re-calculate the covariance every time, there must have been a reason. Otherwise, why not do it just once in the__init__
method? As it was, it was re-calculated every timeset_bandwidth
was called, so maybe people useset_bandwidth
to recalculate everything after modifying the public attributedataset
? I don't really know, but figured it would be safer this way.And maybe subconsciously I want use of
inv_cov
to be as slow as possible : ) See the discussion in the original incarnation of this issue - #5087 (comment).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. That makes sense.