ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

mdhaber · 2022-07-24T04:53:32Z

Reference issue

Supersedes gh-5087

What does this implement/fix?

gh-5087 proposed replacing use of the inverse covariance matrix with the Cholesky decomposition of the covariance matrix throughout gaussian_kde to improve speed and avoid numerical instabilities associate with matrix inversion. There didn't seem to be disagreement from a technical standpoint; it looks like development just stopped.

This PR implements the suggestion for gaussian_kde.pdf. Since logpdf is being Cythonized in gh-15493, I'll leave that alone to avoid merge conflicts.

Additional information

Here is the timing of the KDE benchmarks in main (results from CI of gh-16684):
stats.GaussianKDE.time_gaussian_kde_evaluate_few_points - 1.31±0ms
stats.GaussianKDE.time_gaussian_kde_evaluate_many_points - 1.61±0s

In this PR:
stats.GaussianKDE.time_gaussian_kde_evaluate_few_points - 645±0μs
stats.GaussianKDE.time_gaussian_kde_evaluate_many_points - 905±0ms

We might be able to do better with a closer look at new Cython/Python interactions.

mdhaber · 2022-07-24T05:44:47Z

I think all the failures are the same - ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'. Hopefully an easy fix? But I'm developing on Windows, so I'm not seeing it locally.

=================================== FAILURES ===================================
___________________ test_kde_output_dtype[float128-float32] ____________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3.0
        bw_type    = <class 'numpy.float32'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f48089a7960>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f48089a7960>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
___________________ test_kde_output_dtype[float128-float64] ____________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3.0
        bw_type    = <class 'numpy.float64'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f48080ee960>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f48080ee960>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
___________________ test_kde_output_dtype[float128-float128] ___________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3.0
        bw_type    = <class 'numpy.float128'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f481a01b410>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f481a01b410>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
____________________ test_kde_output_dtype[float128-int32] _____________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3
        bw_type    = <class 'numpy.int32'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f4801001320>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f4801001320>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
____________________ test_kde_output_dtype[float128-int64] _____________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3
        bw_type    = <class 'numpy.int64'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f480100afa0>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f480100afa0>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
____________________ test_kde_output_dtype[float128-scott] _____________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 'scott'
        bw_type    = 'scott'
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f48080cdc80>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f48080cdc80>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
__________________ test_kde_output_dtype[float128-silverman] ___________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3.8-dbg
../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 'silverman'
        bw_type    = 'silverman'
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x7f4800f6c280>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
../testenv/lib/python3.8/site-packages/scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x7f4800f6c280>
        spec       = 'long double'
_stats.pyx:741: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'
        __builtins__ = <builtins>
        __doc__    = None
        __file__   = '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so'
        __loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>
        __name__   = 'scipy.stats._stats'
        __package__ = 'scipy.stats'
        __pyx_capi__ = {'_genhyperbolic_logpdf': <capsule object "double (double, void *)" at 0x7f4843fe4c20>, '_genhyperbolic_pdf': <capsule...at 0x7f4843fe49f0>, '_studentized_range_cdf': <capsule object "double (int, double *, void *)" at 0x7f4843fe4a40>, ...}
        __pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
        __spec__   = ModuleSpec(name='scipy.stats._stats', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f4843ad5b40>.../runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/stats/_stats.cpython-38d-x86_64-linux-gnu.so')
        __test__   = {}
        _center_distance_matrix = <built-in function _center_distance_matrix>
        _kendall_dis = <built-in function _kendall_dis>
        _local_correlations = <built-in function _local_correlations>
        _local_covariance = <built-in function _local_covariance>
        _rank_distance_matrix = <built-in function _rank_distance_matrix>
        _studentized_range_cdf_logconst = <built-in function _studentized_range_cdf_logconst>
        _studentized_range_pdf_logconst = <built-in function _studentized_range_pdf_logconst>
        _toint64   = <built-in function _toint64>
        _transform_distance_matrix = <built-in function _transform_distance_matrix>
        _weightedrankedtau = <cyfunction _weightedrankedtau at 0x7f4843e53050>
        gaussian_kernel_estimate = <cyfunction gaussian_kernel_estimate at 0x7f4843e53450>
        genhyperbolic_logpdf = <built-in function genhyperbolic_logpdf>
        genhyperbolic_pdf = <built-in function genhyperbolic_pdf>
        geninvgauss_logpdf = <built-in function geninvgauss_logpdf>
        linalg     = <module 'scipy.linalg' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/linalg/__init__.py'>
        np         = <module 'numpy' from '/home/runner/.local/lib/python3.8/site-packages/numpy/__init__.py'>
        scipy      = <module 'scipy' from '/home/runner/work/scipy/scipy/build/testenv/lib/python3.8/site-packages/scipy/__init__.py'>
        von_mises_cdf = <built-in function von_mises_cdf>
        warnings   = <module 'warnings' from '/usr/lib/python3.8/warnings.py'>
============================= slowest 10 durations =============================
35.68s call     build/testenv/lib/python3.8/site-packages/scipy/stats/tests/test_continuous_basic.py::test_kappa4_array_gh13[582](https://github.com/scipy/scipy/runs/7485588398?check_suite_focus=true#step:6:583)
25.11s call     build/testenv/lib/python3.8/site-packages/scipy/stats/tests/test_continuous_basic.py::test_cont_basic[500-200-skewnorm-arg91]
21.14s call     build/testenv/lib/python3.8/site-packages/scipy/_lib/tests/test_import_cycles.py::test_modules_importable
17.30s call     build/testenv/lib/python3.8/site-packages/scipy/optimize/tests/test_direct.py::TestDIRECT::test_segmentation_fault[False]
9.06s call     build/testenv/lib/python3.8/site-packages/scipy/stats/tests/test_continuous_basic.py::test_cont_basic[500-200-truncweibull_min-arg100]
8.39s call     build/testenv/lib/python3.8/site-packages/scipy/optimize/tests/test_optimize.py::TestOptimizeSimple::test_minimize_callback_copies_array[fmin]
7.70s call     build/testenv/lib/python3.8/site-packages/scipy/optimize/_trustregion_constr/tests/test_report.py::test_gh12[922](https://github.com/scipy/scipy/runs/7485588398?check_suite_focus=true#step:6:923)
6.91s call     build/testenv/lib/python3.8/site-packages/scipy/special/tests/test_cython_special.py::test_cython_api[elliprj]
6.03s call     build/testenv/lib/python3.8/site-packages/scipy/optimize/tests/test__differential_evolution.py::TestDifferentialEvolutionSolver::test_L4
5.78s call     build/testenv/lib/python3.8/site-packages/scipy/optimize/tests/test__differential_evolution.py::TestDifferentialEvolutionSolver::test_L1
=========================== short test summary info ============================
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-float32]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-float64]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-float128]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-int32]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-int64]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-scott]
FAILED ../testenv/lib/python3.8/site-packages/scipy/stats/tests/test_kdeoth.py::test_kde_output_dtype[float128-silverman]

mdhaber · 2022-07-24T13:31:57Z

Rather than all the permutations required to replace whitening, I could undo gh-8558 and replace the original use of the precision matrix. It would make the code a lot easier to understand. But that would put a triangular solve in the for loop. So I think I'll just add comments in the code that explain how the code relates to the original idea.

mdhaber · 2022-08-06T14:54:43Z

@steppi if you also like linear algebra, this may be interesting to you.

steppi · 2022-08-15T00:13:41Z

@steppi if you also like linear algebra, this may be interesting to you.

Putting this next in my queue.

mdhaber

Trying to fix series of errors like:

___________________ test_kde_output_dtype[float128-float128] ___________________
[gw1] darwin -- Python 3.10.5 /Users/runner/miniconda3/envs/scipy-dev/bin/python
scipy/stats/tests/test_kdeoth.py:328: in test_kde_output_dtype
    result = k(points)
        bw         = 3.0
        bw_type    = <class 'numpy.float128'>
        dataset    = array([0., 1., 2., 3., 4.], dtype=float128)
        dtype      = <class 'numpy.float128'>
        k          = <scipy.stats._kde.gaussian_kde object at 0x140000160>
        points     = array([0., 1., 2., 3., 4.], dtype=float128)
        weights    = array([0., 1., 2., 3., 4.], dtype=float128)
scipy/stats/_kde.py:270: in evaluate
    result = gaussian_kernel_estimate[spec](
        d          = 1
        itemsize   = 16
        m          = 5
        output_dtype = <class 'numpy.float128'>
        points     = array([[0., 1., 2., 3., 4.]], dtype=float128)
        self       = <scipy.stats._kde.gaussian_kde object at 0x140000160>
        spec       = 'long double'
_stats.pyx:748: in scipy.stats._stats.gaussian_kernel_estimate
    ???
E   ValueError: Buffer dtype mismatch, expected 'long double' but got 'double'

scipy/stats/_stats.pyx

scipy/stats/_kde.py

scipy/stats/tests/test_kdeoth.py

scipy/stats/_stats.pyx

[skip azp] [skip circleci]

steppi

I haven't had time to check everything, but things look OK mathematically so far. See the suggestion about adding some comments. I found the permutations a little inscrutable at first, so I think some comments to explain what's going on and link to more details would be helpful.

I should have time to complete my review next weekend.

steppi · 2022-08-21T23:31:39Z

scipy/stats/_kde.py

+            self._data_cho_cov = linalg.cholesky(
+                self._data_covariance[::-1, ::-1]).T[::-1, ::-1]


I think there's enough going on here that there should be some comments to explain things. It took me a bit to figure out what's happening.

Just to make I'm on the same page, here are the details as I understand them:

Let $\Gamma$ be the covariance matrix for the Gaussian kernel and $LL^{\top} = \Gamma^{-1}$ be the Cholesky decomposition of the precision matrix $\Gamma^{-1}$. Later on, we want to be able to transform $n \times d$ data matrices $X$ by multiplying on the right by the Cholesky factor $L$ for $\Gamma^{-1}$. Equivalently, we want to be able to find $Z$ such that $Z = XL$.
If we know $L^{-1}$ we can instead use a triangular solver to $ZL^{-1} = X$. It turns out we can calculate $L^{-1}$ with a Cholesky transform, but not of $\Gamma$, but instead a permuted version of $\Gamma$.

If $RR^{\top} = \Gamma$ then $(R^{-1})^{\top}R^{-1} = \Gamma^{-1}$. This isn't quite a Cholesky decomposition though, since $(R^{-1})^{\top}$ is upper triangular instead of lower triangular. Let $J$ be the antidiagonal matrix with all ones on the antidiagonal and zeros elsewhere. Multiplying on the left by $J$ permutes rows and multiplying on the right by $J$ permutes columns. If we instead calculate the Cholesky decomposition $RR^{\top} = J\Gamma J$, then
$$(R^{-1})^{\top}R^{-1} = J^{-1}\Gamma^{-1}J^{-1} = J\Gamma^{-1}J$$
and thus
$$\Gamma^{-1} = J(R^{-1})^\top R^{-1}J = (JR^{-1}J)^\top(JR^{-1}J)$$

where we've used that $J = J^{-1}$ and $J = J^{\top}$.

This means that if we know the Cholesky factor $R$ of $J\Gamma J$, the Cholesky factor of $\Gamma^{-1}$ is
$J(R^{-1})^{\top}J$. (If $A$ is upper triangular then $JAJ$ is lower triangular). This means $L^{-1} = JR^{\top}J$, and we can write our equation as $ZJR^{\top}J = X$ and solve for $Z$. This is exactly what you've done, but it's still not intuitive for me yet, just algebraic.

I think we should have a comment explaining the types of equations we want to be able to solve
$Z = XL$, where $L = \operatorname{Chol}\left(\Gamma^{-1}\right)$. Equivalently $ZL^{-1} = X$. Also a brief sentence explaining that it's possible to calculate $L^{-1}$ directly from the Cholesky decompostion of the permuted matrix that reverses both the rows and columns of $X$. Also some kind of citation to a place to find more details. The best explanation I've found is in a Mathoverflow answer by the eminent mathematician Robert Israel. Perhaps just a link to this answer would be good enough.

It definitely does deserve an explanation, and I meant to include one before you got to this. Sorry to make you find it on your own. Yes, I think that is the original post I followed. I'll write a bit about the motivation and link to that.

No worries. The explanation is very clear now. I think everything is in good shape now but still want to double check carefully.

I didn't follow the whole argument but if applies here, use the lower=False keyword for starting with an upper triangular in calling cholesky. Might save a column swap or two.

I didn't follow the whole argument but if applies here, use the lower=False keyword for starting with an upper triangular in calling cholesky. Might save a column swap or two.

It's a good thought but isn't quite what we want. According to the documentation for the underlying Lapack function, if $LL^{\top}$ is the Cholesky factorization of $\Gamma$, then
cholesky(Gamma, lower=True) will return L and cholesky(Gamma, lower=False) will return $L^{\top}$. I've tried it to be sure. What we would actually need is to find an upper triangular matrix $U$ such that $\Gamma = UU^{\top}$. The cholesky function can't do this so we're required to do the trick with reversing the rows and columns.

Oh, but I guess it will save us one transpose when computing the Cholesky decomposition of $J\Gamma J$ since we need the upper triangular factor there. It may be worthwhile for small matrices but will most likely have a negligible impact.

scipy/stats/_kde.py

scipy/stats/_stats.pyx

steppi

I think this looks good. Nice work. I suggested one more place that I think could use a comment but it’s fine if you think it isn’t needed.

steppi · 2022-08-23T04:50:02Z

scipy/stats/_kde.py

+    @property
+    def inv_cov(self):
+        self.factor = self.covariance_factor()
+        self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,


Why do we recompute self._data_covariance here? Does it help to maintain backwards compatibility for existing subclasses? If this is needed, could probably use a comment to explain why but it’s fine if you think it isn’t necessary.

I think I assumed that since _compute_covariance used to re-calculate the covariance every time, there must have been a reason. Otherwise, why not do it just once in the __init__ method? As it was, it was re-calculated every time set_bandwidth was called, so maybe people use set_bandwidth to recalculate everything after modifying the public attribute dataset? I don't really know, but figured it would be safer this way.

And maybe subconsciously I want use of inv_cov to be as slow as possible : ) See the discussion in the original incarnation of this issue - #5087 (comment).

Cool. That makes sense.

scipy/stats/_kde.py

mdhaber · 2022-08-23T05:53:28Z

scipy/stats/_kde.py

+    @property
+    def inv_cov(self):
+        self.factor = self.covariance_factor()
+        self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,


I think I assumed that since _compute_covariance used to re-calculate the covariance every time, there must have been a reason. Otherwise, why not do it just once in the __init__ method? As it was, it was re-calculated every time set_bandwidth was called, so maybe people use set_bandwidth to recalculate everything after modifying the public attribute dataset? I don't really know, but figured it would be safer this way.

And maybe subconsciously I want use of inv_cov to be as slow as possible : ) See the discussion in the original incarnation of this issue - #5087 (comment).

dmcdougall and others added 7 commits July 25, 2015 18:46

Remove more uses of inv() and replace with cho_solve()

684c342

Merge remote-tracking branch 'upstream/main' into kde_noinv_breaking

8e65498

ENH: stats.gaussian_kde: try to use cholesky

d79d994

Still not working?

e1a6db4

Merge remote-tracking branch 'upstream/main' into kde_noinv_breaking

9070503

Merge remote-tracking branch 'upstream/main' into gh5087

fc3bbf6

ENH: stats.gaussian_kde: finish replacing use of inv_cov in pdf

035845c

mdhaber added scipy.stats enhancement A new feature or improvement labels Jul 24, 2022

mdhaber requested a review from rkern July 24, 2022 04:53

mdhaber mentioned this pull request Jul 25, 2022

ENH: Convert gaussian_kde logpdf to Cython #15493

Merged

mdhaber requested a review from ilayn August 3, 2022 15:14

mdhaber closed this Aug 9, 2022

mdhaber reopened this Aug 9, 2022

mdhaber commented Aug 20, 2022

View reviewed changes

scipy/stats/_stats.pyx Show resolved Hide resolved

scipy/stats/_kde.py Outdated Show resolved Hide resolved

scipy/stats/tests/test_kdeoth.py Outdated Show resolved Hide resolved

Apply suggestions from code review

6f2af40

mdhaber commented Aug 20, 2022

View reviewed changes

scipy/stats/_stats.pyx Outdated Show resolved Hide resolved

scipy/stats/_stats.pyx Outdated Show resolved Hide resolved

mdhaber added 4 commits August 20, 2022 09:59

Apply suggestions from code review

04e1111

[skip azp] [skip circleci]

Merge remote-tracking branch 'upstream/main' into gh5087

80fcd9a

Merge branch 'gh5087' of github.com:mdhaber/scipy into gh5087

b3bce60

MAINT: stats.kde: move dtype assurance

594266c

steppi reviewed Aug 21, 2022

View reviewed changes

mdhaber commented Aug 22, 2022

View reviewed changes

scipy/stats/_kde.py Show resolved Hide resolved

mdhaber commented Aug 22, 2022

View reviewed changes

scipy/stats/_stats.pyx Show resolved Hide resolved

steppi approved these changes Aug 23, 2022

View reviewed changes

mdhaber commented Aug 23, 2022

View reviewed changes

Update scipy/stats/_kde.py

c6012d3

steppi merged commit 00315a5 into scipy:main Aug 26, 2022

mdhaber mentioned this pull request Sep 8, 2022

ENH: stats.gaussian_kde: replace use of inv_cov in logpdf #16987

Merged

mdhaber added this to the 1.10.0 milestone Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

mdhaber commented Jul 24, 2022 •

edited

mdhaber commented Jul 24, 2022 •

edited

mdhaber commented Jul 24, 2022 •

edited

mdhaber commented Aug 6, 2022

steppi commented Aug 15, 2022

mdhaber left a comment •

edited

steppi left a comment

steppi Aug 21, 2022 •

edited

mdhaber Aug 22, 2022 •

edited

steppi Aug 22, 2022

ilayn Aug 22, 2022

steppi Aug 23, 2022 •

edited

steppi Aug 23, 2022 •

edited

steppi left a comment

steppi Aug 23, 2022

mdhaber Aug 23, 2022

steppi Aug 23, 2022

mdhaber Aug 23, 2022

		self._data_cho_cov = linalg.cholesky(
		self._data_covariance[::-1, ::-1]).T[::-1, ::-1]

ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

ENH: stats.gaussian_kde: replace use of inv_cov in pdf #16692

Conversation

mdhaber commented Jul 24, 2022 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber commented Jul 24, 2022 • edited

mdhaber commented Jul 24, 2022 • edited

mdhaber commented Aug 6, 2022

steppi commented Aug 15, 2022

mdhaber left a comment • edited

Choose a reason for hiding this comment

steppi left a comment

Choose a reason for hiding this comment

steppi Aug 21, 2022 • edited

Choose a reason for hiding this comment

mdhaber Aug 22, 2022 • edited

Choose a reason for hiding this comment

steppi Aug 22, 2022

Choose a reason for hiding this comment

ilayn Aug 22, 2022

Choose a reason for hiding this comment

steppi Aug 23, 2022 • edited

Choose a reason for hiding this comment

steppi Aug 23, 2022 • edited

Choose a reason for hiding this comment

steppi left a comment

Choose a reason for hiding this comment

steppi Aug 23, 2022

Choose a reason for hiding this comment

mdhaber Aug 23, 2022

Choose a reason for hiding this comment

steppi Aug 23, 2022

Choose a reason for hiding this comment

mdhaber Aug 23, 2022

Choose a reason for hiding this comment

mdhaber commented Jul 24, 2022 •

edited

mdhaber commented Jul 24, 2022 •

edited

mdhaber commented Jul 24, 2022 •

edited

mdhaber left a comment •

edited

steppi Aug 21, 2022 •

edited

mdhaber Aug 22, 2022 •

edited

steppi Aug 23, 2022 •

edited

steppi Aug 23, 2022 •

edited