Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

Merged
merged 13 commits into from
Aug 26, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
27 changes: 19 additions & 8 deletions scipy/stats/_kde.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,8 +266,11 @@ def evaluate(self, points):
else:
raise TypeError('%s has unexpected item size %d' %
(output_dtype, itemsize))
result = gaussian_kernel_estimate[spec](self.dataset.T, self.weights[:, None],
points.T, self.inv_cov, output_dtype)

result = gaussian_kernel_estimate[spec](
self.dataset.T, self.weights[:, None],
points.T, self.cho_cov, output_dtype)

return result[:, 0]

__call__ = evaluate
Expand Down Expand Up @@ -576,17 +579,25 @@ def _compute_covariance(self):
covariance_factor().
"""
self.factor = self.covariance_factor()
steppi marked this conversation as resolved.
Show resolved Hide resolved
# Cache covariance and inverse covariance of the data
if not hasattr(self, '_data_inv_cov'):
# Cache covariance and permuted cholesky decomp of permuted covariance
if not hasattr(self, '_data_cho_cov'):
self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,
bias=False,
aweights=self.weights))
self._data_inv_cov = linalg.inv(self._data_covariance)
self._data_cho_cov = linalg.cholesky(
self._data_covariance[::-1, ::-1]).T[::-1, ::-1]
Comment on lines +585 to +586
Copy link
Contributor

@steppi steppi Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's enough going on here that there should be some comments to explain things. It took me a bit to figure out what's happening.

Just to make I'm on the same page, here are the details as I understand them:

Let $\Gamma$ be the covariance matrix for the Gaussian kernel and $LL^{\top} = \Gamma^{-1}$ be the Cholesky decomposition of the precision matrix $\Gamma^{-1}$. Later on, we want to be able to transform $n \times d$ data matrices $X$ by multiplying on the right by the Cholesky factor $L$ for $\Gamma^{-1}$. Equivalently, we want to be able to find $Z$ such that $Z = XL$.

If we know $L^{-1}$ we can instead use a triangular solver to $ZL^{-1} = X$. It turns out we can calculate $L^{-1}$ with a Cholesky transform, but not of $\Gamma$, but instead a permuted version of $\Gamma$.

If $RR^{\top} = \Gamma$ then $(R^{-1})^{\top}R^{-1} = \Gamma^{-1}$. This isn't quite a Cholesky decomposition though, since $(R^{-1})^{\top}$ is upper triangular instead of lower triangular. Let $J$ be the antidiagonal matrix with all ones on the antidiagonal and zeros elsewhere. Multiplying on the left by $J$ permutes rows and multiplying on the right by $J$ permutes columns. If we instead calculate the Cholesky decomposition $RR^{\top} = J\Gamma J$, then
$$(R^{-1})^{\top}R^{-1} = J^{-1}\Gamma^{-1}J^{-1} = J\Gamma^{-1}J$$
and thus
$$\Gamma^{-1} = J(R^{-1})^\top R^{-1}J = (JR^{-1}J)^\top(JR^{-1}J)$$

where we've used that $J = J^{-1}$ and $J = J^{\top}$.

This means that if we know the Cholesky factor $R$ of $J\Gamma J$, the Cholesky factor of $\Gamma^{-1}$ is
$J(R^{-1})^{\top}J$. (If $A$ is upper triangular then $JAJ$ is lower triangular). This means $L^{-1} = JR^{\top}J$, and we can write our equation as $ZJR^{\top}J = X$ and solve for $Z$. This is exactly what you've done, but it's still not intuitive for me yet, just algebraic.

I think we should have a comment explaining the types of equations we want to be able to solve
$Z = XL$, where $L = \operatorname{Chol}\left(\Gamma^{-1}\right)$. Equivalently $ZL^{-1} = X$. Also a brief sentence explaining that it's possible to calculate $L^{-1}$ directly from the Cholesky decompostion of the permuted matrix that reverses both the rows and columns of $X$. Also some kind of citation to a place to find more details. The best explanation I've found is in a Mathoverflow answer by the eminent mathematician Robert Israel. Perhaps just a link to this answer would be good enough.

Copy link
Contributor Author

@mdhaber mdhaber Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely does deserve an explanation, and I meant to include one before you got to this. Sorry to make you find it on your own. Yes, I think that is the original post I followed. I'll write a bit about the motivation and link to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. The explanation is very clear now. I think everything is in good shape now but still want to double check carefully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow the whole argument but if applies here, use the lower=False keyword for starting with an upper triangular in calling cholesky. Might save a column swap or two.

Copy link
Contributor

@steppi steppi Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow the whole argument but if applies here, use the lower=False keyword for starting with an upper triangular in calling cholesky. Might save a column swap or two.

It's a good thought but isn't quite what we want. According to the documentation for the underlying Lapack function, if $LL^{\top}$ is the Cholesky factorization of $\Gamma$, then
cholesky(Gamma, lower=True) will return L and cholesky(Gamma, lower=False) will return $L^{\top}$. I've tried it to be sure. What we would actually need is to find an upper triangular matrix $U$ such that $\Gamma = UU^{\top}$. The cholesky function can't do this so we're required to do the trick with reversing the rows and columns.

Copy link
Contributor

@steppi steppi Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but I guess it will save us one transpose when computing the Cholesky decomposition of $J\Gamma J$ since we need the upper triangular factor there. It may be worthwhile for small matrices but will most likely have a negligible impact.


self.covariance = self._data_covariance * self.factor**2
self.inv_cov = (self._data_inv_cov / self.factor**2).astype(np.float64)
L = linalg.cholesky(self.covariance*2*pi)
self.log_det = 2*np.log(np.diag(L)).sum()
self.cho_cov = self._data_cho_cov * self.factor
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
self.log_det = 2*np.log(np.diag(self.cho_cov
* np.sqrt(2*pi))).sum()

@property
def inv_cov(self):
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
self.factor = self.covariance_factor()
self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we recompute self._data_covariance here? Does it help to maintain backwards compatibility for existing subclasses? If this is needed, could probably use a comment to explain why but it’s fine if you think it isn’t necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I assumed that since _compute_covariance used to re-calculate the covariance every time, there must have been a reason. Otherwise, why not do it just once in the __init__ method? As it was, it was re-calculated every time set_bandwidth was called, so maybe people use set_bandwidth to recalculate everything after modifying the public attribute dataset? I don't really know, but figured it would be safer this way.

And maybe subconsciously I want use of inv_cov to be as slow as possible : ) See the discussion in the original incarnation of this issue - #5087 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. That makes sense.

bias=False, aweights=self.weights))
return linalg.inv(self._data_covariance) / self.factor**2

def pdf(self, x):
"""
Expand Down
25 changes: 13 additions & 12 deletions scipy/stats/_stats.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ from numpy cimport ndarray, int64_t, float64_t, intp_t
import warnings
import numpy as np
import scipy.stats, scipy.special
import scipy.linalg as linalg
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
cimport scipy.special.cython_special as cs

np.import_array()
Expand Down Expand Up @@ -699,10 +700,9 @@ ctypedef fused real:

@cython.wraparound(False)
@cython.boundscheck(False)
def gaussian_kernel_estimate(points, values, xi, precision, dtype, real _=0):
def gaussian_kernel_estimate(points, values, xi, cho_cov, dtype,
real _=0):
"""
def gaussian_kernel_estimate(points, real[:, :] values, xi, precision)

Evaluate a multivariate Gaussian kernel estimate.

Parameters
Expand All @@ -713,16 +713,16 @@ def gaussian_kernel_estimate(points, values, xi, precision, dtype, real _=0):
Multivariate values associated with the data points.
xi : array_like with shape (m, d)
Coordinates to evaluate the estimate at in d dimensions.
precision : array_like with shape (d, d)
Precision matrix for the Gaussian kernel.
cho_cov : array_like with shape (d, d)
Permuted Cholesky factor of permuted covariance of the data.

Returns
-------
estimate : double[:, :] with shape (m, p)
Multivariate Gaussian kernel estimate evaluated at the input coordinates.
"""
cdef:
real[:, :] points_, xi_, values_, estimate, whitening
real[:, :] points_, xi_, values_, estimate
int i, j, k
int n, d, m, p
real arg, residual, norm
Expand All @@ -734,19 +734,20 @@ def gaussian_kernel_estimate(points, values, xi, precision, dtype, real _=0):

if xi.shape[1] != d:
raise ValueError("points and xi must have same trailing dim")
if precision.shape[0] != d or precision.shape[1] != d:
raise ValueError("precision matrix must match data dims")
if cho_cov.shape[0] != d or cho_cov.shape[1] != d:
raise ValueError("Covariance matrix must match data dims")

# Rescale the data
whitening = np.linalg.cholesky(precision).astype(dtype, copy=False)
points_ = np.dot(points, whitening).astype(dtype, copy=False)
xi_ = np.dot(xi, whitening).astype(dtype, copy=False)
points_ = linalg.solve_triangular(cho_cov, points.T, lower=False).T
points_ = np.asarray(points_).astype(dtype, copy=False)
xi_ = linalg.solve_triangular(cho_cov, xi.T, lower=False).T
xi_ = np.asarray(xi_).astype(dtype, copy=False)
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
values_ = values.astype(dtype, copy=False)
mdhaber marked this conversation as resolved.
Show resolved Hide resolved

# Evaluate the normalisation
norm = math.pow((2 * PI) ,(- d / 2))
for i in range(d):
norm *= whitening[i, i]
norm /= cho_cov[i, i]

# Create the result array and evaluate the weighted sum
estimate = np.zeros((m, p), dtype)
Expand Down
16 changes: 2 additions & 14 deletions scipy/stats/tests/test_kdeoth.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,16 +218,6 @@ def __init__(self, dataset):
super().__init__(dataset)


class _kde_subclass3(stats.gaussian_kde):
def __init__(self, dataset, covariance):
self.covariance = covariance
stats.gaussian_kde.__init__(self, dataset)

def _compute_covariance(self):
self.inv_cov = np.linalg.inv(self.covariance)
self._norm_factor = np.sqrt(np.linalg.det(2 * np.pi * self.covariance))


class _kde_subclass4(stats.gaussian_kde):
def covariance_factor(self):
return 0.5 * self.silverman_factor()
Expand All @@ -251,10 +241,8 @@ def test_gaussian_kde_subclassing():
y2 = kde2(xs)
assert_array_almost_equal_nulp(ys, y2, nulp=10)

# subclass 3
kde3 = _kde_subclass3(x1, kde.covariance)
y3 = kde3(xs)
assert_array_almost_equal_nulp(ys, y3, nulp=10)
# subclass 3 was removed because we have no obligation to maintain support
# for manual invocation of private methods
mdhaber marked this conversation as resolved.
Show resolved Hide resolved

# subclass 4
kde4 = _kde_subclass4(x1)
Expand Down