-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Convert gaussian_kde logpdf to Cython #15493
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @europeanplaice for the PR. To measure the performance, you can have a look at our benchmark system https://scipy.github.io/devdocs/dev/contributor/benchmarking.html#benchmarking-with-asv
@tupui Thank you for your advice but, currently, I can't build scipy in my environment. So I ran a benchmark test here. |
Thank you @tupui. Please correct me if I am wrong. Lines 590 to 601 in a2b5304
and Lines 795 to 796 in a2b5304
is the same tests. So the test in cython can be removed. Below also could be moved to Lines 797 to 798 in a2b5304
Below, I can't validate these lines raise an error because Lines 603 to 613 in a2b5304
|
Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @europeanplaice. LGTM, but it would be good if someone else could have a look. I added a milestone to hopefully get more attention.
Note: the benchmark should be refactored with a param instead of having 2 different set of tests for low/high number of points. A bit out of scope but can be a followup if you are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't review the algorithmic details or Cython, but the Python side looks OK. I'm not very familiar with the gaussian_kde
code, so is it already well tested for logpdf
accuracy? If not, I'd add accuracy tests. Yes it is tested against pdf
. I will also check whether kernel.logpdf(values)
for the example in the docstring changes much. Update: maximum difference is 1.776e-15
, no problem there. Speedup is a factor of ~2x (35.3 ms vs 84.8 ms.)
@tupui if these are tested to your satisfaction, feel free to merge unless you'd like to wait for a more thorough review from someone qualified to go through the algorithm line by line.
scipy/stats/_kde.py
Outdated
result[i] = logsumexp(0.5 * log_to_sum) | ||
|
||
return result | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be tested? Or, if it is already tested, can we adda match
to make sure the intended message is emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially, it can be tested.
I copied it from evaluate
method so that logpdf
should be consistent with evaluate
and I don't have any reason to write this code except for it.
np.common_type
returns one of [_nx.half, _nx.single, _nx.double, _nx.longdouble, _nx.csingle, _nx.cdouble, _nx.clongdouble]
. And its itemsize
is also one of them
import numpy.core.numeric as _nx
print(np.dtype(_nx.half), np.dtype(_nx.half).itemsize)
print(np.dtype(_nx.single), np.dtype(_nx.single).itemsize)
print(np.dtype(_nx.double), np.dtype(_nx.double).itemsize)
print(np.dtype(_nx.longdouble), np.dtype(_nx.longdouble).itemsize)
print(np.dtype(_nx.csingle), np.dtype(_nx.csingle).itemsize)
print(np.dtype(_nx.cdouble), np.dtype(_nx.cdouble).itemsize)
print(np.dtype(_nx.clongdouble), np.dtype(_nx.clongdouble).itemsize)
float16 2
float32 4
float64 8
float128 16
complex64 8
complex128 16
complex256 32
It means if np.dtype is complex256 or float16 it raises an error.
For complex256 , the test would be like,
def test_pdf_logpdf_dtype_validation():
xn = (np.random.random(10) + np.random.random(10) * 1j).astype("clongdouble")
xs = (np.random.random(10) + np.random.random(10) * 1j).astype("clongdouble")
gkde = stats.gaussian_kde(xn)
msg = "<class 'numpy.complex256'> has unexpected item size 32"
with pytest.raises(ValueError, match=msg):
gkde.logpdf(xs)
But, for float16 it can't be tested. gkde.covariance
is float64 so np.common_type returns float64 even if an array's dtype is float16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviews seem mostly positive here--the duplicated asv
benchmark is maybe a bit awkward?
benchmarks/benchmarks/stats.py
Outdated
|
||
def time_gaussian_kde_logpdf_many_points(self): | ||
# test gaussian_kde logpdf on many points | ||
self.kernel.logpdf(self.positions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a single function named time_gaussian_kde_logpdf
with time_gaussian_kde_logpdf.params = [self.positions, self.positions[:, :10]
?
Like this: https://asv.readthedocs.io/en/stable/writing_benchmarks.html#parameterized-benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and I agree with you. I added param
to parameterize the benchmarks.
It's now ~100 ms, while it was close to a second (which is too much).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some changes that were needed. The bigger issue is in the Cython code however - the new code is mostly (80% - 90%) the same as the code from gaussian_kernel_estimate
, with a few discrepancies that are not easy to understand. @europeanplaice it seems much better to me to either make this a single routine that deviates only when there's a log
yes/no, or if that's not easily possible, then refactor the common parts out or template the routine (.pyx.in
) so two copies of this routine come from the same source.
scipy/stats/_stats.pyx
Outdated
estimate = np.zeros((m, p), dtype) | ||
zeros_max = np.zeros((p,), dtype) - np.inf | ||
empty_estimate_ = np.empty((p, n), dtype) | ||
for j in range(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop variables i
and j
are exactly reversed compared to gaussian_kernel_estimate
, can you please make them the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I corrected the order of i
and j
, it would not work well because
# Calculate logsumexp
for k in range(p):
for i in range(n):
estimate[j, k] += math.exp(estimate_[k, i] - max_[k])
estimate[j, k] = math.log(estimate[j, k])
estimate[j, k] += max_[k]
this block depends on j
. Currently, the loop logic is complicated so it's hard to make it align with gaussian_kernel_estimate
. The code would be simple if we could separate logsumexp
from this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code would be simple if we could separate logsumexp from this loop.
I was going to suggest the same. Please do make a logsumexp
function separate from this loop and add a simple unit test for it that compares it against special.logsumexp
. I see you've already discussed np.log
vs looping over math.log
, so I won't ask for that to be changed, but we are already going to get a substantial speed boost from converting this to Cython, so I think it's fair to balance speed and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made them the same.
scipy/stats/_stats.pyx
Outdated
for k in range(p): | ||
estimate_[k, i] = log_values_[i, k] + log_arg | ||
|
||
# Collect max value for logsumexp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is going on here to be honest - can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also has to do with estimate
, estimate_
and empty_estimate_
- the copying and multiple arrays used there seem unnecessary. Why can this not be the same as for gaussian_kernel_estimate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is going on here to be honest - can you explain?
When it returns
I see an underflow error because the value of
To avoid this problem the formula should be changed as
By this conversion, the range of the target of
This function exists as logsumexp
in scipy. However, scipy's logsumexp
is implemented in Python, to keep the calculate fast, I wrote it by myself. In this code block you quoted, it tries to find
In the calculation of $ log $ world, multiplication changes to addition.
gaussian_kernel_estimate
s'
for k in range(p):
estimate[j, k] += values_[i, k] * arg
corresponds to
gaussian_kernel_estimate_log
s'
for k in range(p):
estimate_[k, i] = log_values_[i, k] + log_arg
It also has to do with estimate, estimate_ and empty_estimate_ - the copying and multiple arrays used there seem unnecessary. Why can this not be the same as for gaussian_kernel_estimate?
For each m
, I have to reset estimate_
. If we didn't reset it, it would be possible that the previous iteration's max value would be selected.
I'm not sure my explanation has enough information. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @europeanplaice, yes that makes sense. It looks like this would have been much easier if we had logsumexp
implemented in C/Cython and exposed in special.cython_special
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It would be great not only for this PR, but also for future contributions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I am just passing by to look for any PR involving Cython in scipy.
We do have a Cython implementation for logsumexp
in scikit-learn. I guess you can reuse it.
On a side note, I am available for reviewing Cython PRs for scipy. So if needed, feel free to @
me. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jjerphan that's good to know 😃 💪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a place where people could chase for reviewers. 😃
As for scikit-learn we are looking for reviewers, especially for Cython. So interproject support and communication are both nice and necessary for projects' heath and dynamic IMO, so here is my trial encouraging it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a place where a person would propose some reviewing expertise? I would even wider this and say, someone proposes a certain expertise. Sometimes we have features on our wish list that we don't know much, or have deep technical questions that do not need software eng. expertise at all.
That would be awesome. We could do something similar to Publon and think about how to motivate people to engage and do this. This is definitely something that Scientific Python think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created this thread on the Scientific Python Discourse to broaden the discussions regarding sharing skills across projects among the ecosystem.
I'm going to bump the milestone here if that is alright. It sounds like Ralf has carved out a path forward for this PR at least. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @europeanplaice. I think this is close. Just a few thoughts from me.
scipy/stats/_stats.pyx
Outdated
estimate = np.zeros((m, p), dtype) | ||
zeros_max = np.zeros((p,), dtype) - np.inf | ||
empty_estimate_ = np.empty((p, n), dtype) | ||
for j in range(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code would be simple if we could separate logsumexp from this loop.
I was going to suggest the same. Please do make a logsumexp
function separate from this loop and add a simple unit test for it that compares it against special.logsumexp
. I see you've already discussed np.log
vs looping over math.log
, so I won't ask for that to be changed, but we are already going to get a substantial speed boost from converting this to Cython, so I think it's fair to balance speed and readability.
Though I haven't written a unit test, I made a cython version |
Note that this PR overlaps on some portions with #16758. |
# Create the result array and evaluate the weighted sum | ||
estimate = np.full((m, p), fill_value=-np.inf, dtype=dtype) | ||
for i in range(n): | ||
for j in range(m): | ||
arg = 0 | ||
for k in range(d): | ||
residual = (points_[i, k] - xi_[j, k]) | ||
arg += residual * residual | ||
|
||
arg = -arg / 2 + log_norm | ||
for k in range(p): | ||
estimate[j, k] = logsumexp(estimate[j, k], | ||
arg + log_values_[i, k]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this line-by-line equivalent with gaussian_kernel_estimate
. If this resulted in a performance loss, please open a new PR to change it. I'd suggest getting a Cython-accessible logsumexp
function addeded to scipy.special
first, so it doesn't need to be part of this PR, and please demonstrate the difference in performance.
@europeanplaice @jjerphan I edited things so that I'd be comfortable merging this. Please check that I did not make any huge mistakes, since I am not very fluent in Cython. If CI comes back green, I'll plan to merge on Monday. |
I didn't find any mistake in the commit and it looks good to me. The code became simpler than before. Thank you. |
I won't be able to do a review in the coming days. Hence just move on without me if you need to. |
Thanks @europeanplaice and reviewers! I'll add an entry to the release notes. |
Reference issue
What does this implement/fix?
In this PR, I converted gaussian_kde.logpdf to Cython. The code is based on
gaussian_kernel_estimate
so it may not be fully optimized. To take advantage of Cython, I avoided built-in functions, but for example, I addedlogsumexp
manually and introduced nested loops. Feedback is welcome.Additional information
You couldn't see a significant performance improvement on middle-sized datasets. However, as far as I know, this Cython code can run faster especially on large-scale datasets.
I am going to run more detailed benchmark tests.