Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: scipy.signal.find_peaks for 2D signals #20489

Open
josephernest opened this issue Apr 16, 2024 · 17 comments
Open

ENH: scipy.signal.find_peaks for 2D signals #20489

josephernest opened this issue Apr 16, 2024 · 17 comments
Labels
enhancement A new feature or improvement scipy.signal

Comments

@josephernest
Copy link

josephernest commented Apr 16, 2024

As stated in the documentation, scipy.signal.find_peaks works for 1D signals.

There are many attempts of similar function for 2D arrays, such as https://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array, https://stackoverflow.com/questions/16842823/peak-detection-in-a-noisy-2d-array, (or other methods in sklearn: https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_peak_local_max.html), etc.

Is there an implementation of 2D-array peak detection that could become an "official" scipy.signal.find_peaks_2d function?

This would be interesting in many scientific purposes.

Have there been discussions about adding 2D peak detection in scipy, with prominence like in scipy.signal.find_peaks?

@josephernest josephernest added the enhancement A new feature or improvement label Apr 16, 2024
@tupui
Copy link
Member

tupui commented Apr 17, 2024

Thanks @josephernest for the report.

@lagru can you take this one? 😃

@lagru
Copy link
Contributor

lagru commented Apr 17, 2024

Is there an implementation of 2D-array peak detection that could become an "official" scipy.signal.find_peaks_2d function?

As far as I'm aware there isn't but there has been some work in scikit-image to get there for 2D and more dimensions.

For prominence filtering there hasn't been anything that I'm aware of. I've thought about tackling that after scikit-image/scikit-image#4165 though.

A quick search also brought up https://github.com/Xunius/python_peak_promience2d (via this ST answer) but I haven't investigated it closely yet.

@tupui
Copy link
Member

tupui commented Apr 17, 2024

Thanks a lot @lagru. Would you say this should be kept open as wanted or it's something to do in another library than SciPy.

@josephernest
Copy link
Author

josephernest commented Apr 17, 2024

Thanks @tupui and @lagru.

Would you say this should be kept open as wanted or it's something to do in another library than SciPy.

The rationale behind this was the fact that scipy.signal.find_peaks is really popular (see e.g. https://stackoverflow.com/questions/1713335/peak-finding-algorithm-for-python-scipy), and I think people are looking for an equivalent of this for 2D arrays, possibly with the same Scipy API (prominence, etc.).

@lagru
Copy link
Contributor

lagru commented Apr 17, 2024

Would you say this should be kept open as wanted or it's something to do in another library than SciPy.

That depends on whether you see this functionality as within scope of SciPy. I've never been really clear on the border between scipy.ndimage and skimage. I'd say keep this open. Might be a good indicator down the line for where people look for this functionality first.

Personally, it seems general enough to me to include it in SciPy with what's available already. And I wouldn't be opposed to migrating that code to SciPy. I will probably try to move this along within scikit-image though first and then look into whether it is worth moving later. It's would also be trivial to point users to scikit-image in find_peaks' docstring and within an exception message.

@tupui
Copy link
Member

tupui commented Apr 17, 2024

Sounds good 👍 thanks 🙏

@lagru
Copy link
Contributor

lagru commented Apr 20, 2024

Fyi, I polished off scikit-image/scikit-image#4165 and hope that we get the reviews to merge it soon. @josephernest, would that be a step in the right direction?

@tupui, that PR implements another algorithm in skimage that repeatedly calls on

def query_ball_point(cKDTree self, object x, object r,
np.float64_t p=2., np.float64_t eps=0, object workers=None,
return_sorted=None,
return_length=False, **kwargs):

from within Cython code. Unfortunately, it returns a list which together with the Python-facing API makes it somewhat hard to release the GIL and do fast iteration. I didn't take a thorough look at all discussions here, but do you know of any efforts in SciPy to expose a C,C++-API like NumPy does with get_include?

If we could wrap the functionality from cKDTree directly in Cython that would likely yield significant performance benefits. The only other solution I can think of would be to vendor it, which seems wasteful.

@tupui
Copy link
Member

tupui commented Apr 21, 2024

Thanks @lagru!

I am not sure there are discussions for that. But worth asking @tylerjereddy.

@tylerjereddy
Copy link
Contributor

Oh gosh, I'm not aware of any plans to make our spatial C/C++-level "API" consumable in a public sense. I guess there's some stuff happening like that in special, but a decent chunk of spatial is wrapped around Qhull, and the community also has the (more?) popular/competing license-incompatible CGAL, which is quite different to use I think.

I'm also not entirely clear on the route to performance improvement here, since the function can return ragged data structures, which is why the nested objects/lists are used. It already takes a workers argument, is there no way for you to use concurrency to get some speedup that way after accumulating enough points? I'll take a look at the skimage source in a minute to see if anything sticks out, and maybe ask for sample timings there.

@tylerjereddy
Copy link
Contributor

We can also ask @sturlamolden of course.

@tylerjereddy
Copy link
Contributor

And, not to derail even more, but sklearn also has KDTree. Maybe some opportunities to consolidate things somehow, but yikes on the amount of effort that would be to coordinate (I talked to Thomas Fan about that at some point at SciPy conference).

@tupui
Copy link
Member

tupui commented Apr 21, 2024

Thanks Tyler. I agree that it would be good if we could coordinate something with sklearn 👍

@lagru
Copy link
Contributor

lagru commented Apr 24, 2024

And, not to derail even more, but sklearn also has KDTree.

Oh, sklearn's KDTree returns arrays on query. I definitely need to check this out from a performance perspective. Maybe the biggest challenge with SciPy's current version is iterating the results quickly. The returned list or (array of lists) makes it hard to do fast Cython-indexing or using nogil.

Speaking of, I'm a bit surprised by this bit

vector[vector[np.intp_t]] vvres

m = <np.intp_t> (vvres[i].size())
tmp = m * [None]
cur = vvres[i].data()
for j in range(m):
tmp[j] = cur[j]
vout[start + i] = tmp

The internal C++ implementation returns a vector which is then unpacked into a list whose size is even pre-allocated with m * [None]. Maybe I'm missing context, but why not transform the C++ vector into a proper array instead?

I'd be happy to make a PR that adds an option to return a more efficient data structure. Though, I'd be happy to know what kind of API update you would prefer. A new function query_radius, query_ball_point_arr (like sklearn)? I'd prefer SciPy's kdtree long-term, as sklearn is only an optional dependency.

@sturlamolden
Copy link
Contributor

sturlamolden commented Apr 24, 2024

The internal C++ implementation returns a vector which is then unpacked into a list whose size is even pre-allocated with m * [None]. Maybe I'm missing context, but why not transform the C++ vector into a proper array instead?

I'd be happy to make a PR that adds an option to return a more efficient data structure. Though, I'd be happy to know what kind of API update you would prefer. A new function query_radius, query_ball_point_arr (like sklearn)? I'd prefer SciPy's kdtree long-term, as sklearn is only an optional dependency.

A list is returned because cKDTree was written to behave identical to KDTree, which was a pure Python implementation. Why @aarchiba chose to return this data structure from KDTree I do not know. A likely reason is that at the Python side the results would be a "jagged array" which ndarray do not support, except by having an ndarray with dtype=object and stacking ndarrays into an ndarray. In any case it would not allow for iteration without holding the GIL, as iterating something with dtype=object requires you to hold the GIL, and it would also incure a lot of reference counting.

Another oddity that has the same historical explanation is that the results from query_ball_pointand query_ball_tree are transposed instead of identical.

I do now why I used a std::vector internally in the C++ code, though (obviously, since I wrote the code). That was to release the GIL while performing the query in C++, to precent cKDTree from locking up the interpreter. A common use of kd-trees are games and computer graphics, and having a GUI or CG that freezes is bad, even if it just means missing a few frames. That was the very reason I moved cKDTree to C++.

The C++ code returns a std::vector of std::vector<npy_intp>, which is a simple way of making a jagged 2D array in C++.

Basically to make this efficient, you need to create a cdef class that stores the std::vector< std::vector<npy_intp> > internally and (optinally) return this to Python instead of a list. Then your Cython code that uses cKDTree.query_ball_tree could grab the jagged array from this class and iterate without holding the GIL. It is actually rather trivial to implement.

@sturlamolden
Copy link
Contributor

sturlamolden commented Apr 24, 2024

The internal C++ implementation returns a vector which is then unpacked into a list whose size is even pre-allocated with m * [None]. Maybe I'm missing context, but why not transform the C++ vector into a proper array instead?

The short ansver is numpy.ndarray does not support jagged 2D arrays, so you cannot just call PyArray_SimpleNewFromData.

https://en.wikipedia.org/wiki/Jagged_array

@lagru
Copy link
Contributor

lagru commented Apr 24, 2024

Thanks for the insights! :) You totally right, I glossed over the fact, that each point may return a different number of neighbors. If I understand this correctly, a list of arrays would work for the jagged case? That might already be a lot easier to iterate.

Basically to make this efficient, you need to create a cdef class that stores the std::vector< std::vector<npy_intp> > internally and (optinally) return this to Python instead of a list. Then your Cython code that uses cKDTree.query_ball_tree could grab the jagged array from this class and iterate without holding the GIL. It is actually rather trivial to implement.

To clarify, are you suggesting to do this on skimage's or SciPy's side? I thought about using the internal C++ implementation more directly from our side, but wasn't sure how to do that.

@sturlamolden
Copy link
Contributor

Thanks for the insights! :) You totally right, I glossed over the fact, that each point may return a different number of neighbors. If I understand this correctly, a list of arrays would work for the jagged case? That might already be a lot easier to iterate.

A list of arrays would work. However, you would need to hold the GIL while iterating over the list, and you would incure increfs and defcrefs for each array in the list. You would also need to use the Python buffer API (slow) or NumPy C API (faster) to grab the data from each array. Basically this will be slow too.

Basically to make this efficient, you need to create a cdef class that stores the std::vector< std::vector<npy_intp> > internally and (optinally) return this to Python instead of a list. Then your Cython code that uses cKDTree.query_ball_tree could grab the jagged array from this class and iterate without holding the GIL. It is actually rather trivial to implement.

To clarify, are you suggesting to do this on skimage's or SciPy's side? I thought about using the internal C++ implementation more directly from our side, but wasn't sure how to do that.

If you are going to do this in skimage you might consider just making a small C or C++ kd-tree that will just to what you need and nothing else.

The ABI of scipy.spatial.cKDTree is not frozen, so using it with custom C++ would be fragile.

Making a tiny kd-tree for the purpose of range query is very easy. So my advice is do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.signal
Projects
None yet
Development

No branches or pull requests

7 participants
@sturlamolden @josephernest @tylerjereddy @lagru @tupui @j-bowhay and others