ENH: ndimage: log(n) implementation for 1D rank filter #20543

ggkogan · 2024-04-20T22:40:16Z

Reference issue

Closes gh-20026

What does this implement/fix?

reducing the 1D rank filter run time from complexity from n to log n.
this is used in multiple functions:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.medfilt.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.median_filter.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.rank_filter.html
https://docs.scipy.org/doc/scipy-1.12.0/reference/generated/scipy.ndimage.percentile_filter.html

ggkogan · 2024-04-21T07:59:04Z

I see many failed tests due to issues that seem to be not related to the current pull request. Is there anything I can/should do about it?

rgommers · 2024-04-21T08:03:15Z

If test failures are really unrelated, you can simply ignore them. However, I had a quick look and all the ones I saw do look related (e.g. TestNdimageInterpolation)

ggkogan · 2024-04-21T08:05:08Z

You are right. I will correct it and resubmit.

rgommers · 2024-04-21T08:10:26Z

I will correct it and resubmit.

Thanks. Just to make sure: you can push new commits to the same branch that you opened this PR from, and they will show up here. No need to open a second PR.

scipy/ndimage/meson.build

scipy/ndimage/_rank_filter_1d.pyx

rgommers · 2024-04-21T08:26:03Z

I had a quick look at the code, and the heavy use of templating and fused types jumped out to me. It results in a too-large size of the new extension module, it's immediately almost the largest one in the ndimage module:

% ls -lh build/scipy/ndimage/*.so | awk '{print $5, $9}'
51K build/scipy/ndimage/_ctest.cpython-311-darwin.so
105K build/scipy/ndimage/_cytest.cpython-311-darwin.so
161K build/scipy/ndimage/_nd_image.cpython-311-darwin.so
372K build/scipy/ndimage/_ni_label.cpython-311-darwin.so
345K build/scipy/ndimage/_rank_filter_1d.cpython-311-darwin.so

Could this be limited to a subset of types? What does _nd_image.rank_filter support, and why would it be necessary to extend that already-supported set?

Started working on https://ideone.com/8VVEa, I optimized by [...]

The license of this code looks a little problematic. It has a broken link to something that (from the mit-license in the link adddress) indicates the author licensed it as MIT, but that is a bit inconclusive. Is there something more definite for this code? If not, it'd be good to contact the original author to try to clarify that this is really MIT-licensed code.

rgommers · 2024-04-21T08:31:19Z

The performance improvement looks promising! If the scaling changed from N to log(N), the usual way to demonstrate that is to plot runtime as a function of kernel size (or input array size). E.g., like the graphs in https://github.com/rgommers/explore-array-computing/blob/master/explore_xnd.ipynb). I think the vertical axis of your plot is more or less like that, but a bit hard to read. Is it log(N) in kernel size, and is there any change for array input size or filter order?

ggkogan · 2024-04-21T08:56:52Z

E.g., like the graphs in

The vertical axis is the kernel size but the improvement for the median filter is better than for large/small ranks. I wanted to demonstrate both of the aspects. In general, I understand that you are going to perform independent testing so this was for initial illustration only. Thanks for the comment anyway. I will consider it in the future.

ggkogan · 2024-04-21T12:02:04Z

I will correct it and resubmit.

Thanks. Just to make sure: you can push new commits to the same branch that you opened this PR from, and they will show up here. No need to open a second PR.

I am sorry, it was 3am and my brain was totally crashed :)

ggkogan · 2024-04-21T14:11:08Z

I had a quick look at the code, and the heavy use of templating and fused types jumped out to me. It results in a too-large size of the new extension module, it's immediately almost the largest one in the ndimage module:
% # on macOS arm64
% ls -lh build/scipy/ndimage/*.so | awk '{print $5, $9}'
51K build/scipy/ndimage/_ctest.cpython-311-darwin.so
105K build/scipy/ndimage/_cytest.cpython-311-darwin.so
161K build/scipy/ndimage/_nd_image.cpython-311-darwin.so
372K build/scipy/ndimage/_ni_label.cpython-311-darwin.so
345K build/scipy/ndimage/_rank_filter_1d.cpython-311-darwin.so
Could this be limited to a subset of types? What does _nd_image.rank_filter support, and why would it be necessary to extend that already-supported set?

Started working on https://ideone.com/8VVEa, I optimized by [...]

The license of this code looks a little problematic. It has a broken link to something that (from the mit-license in the link adddress) indicates the author licensed it as MIT, but that is a bit inconclusive. Is there something more definite for this code? If not, it'd be good to contact the original author to try to clarify that this is really MIT-licensed code.

Concerning the license, the code contained in the referred link is also contained in Stackoverlow. Does it solve the problem?
https://stackoverflow.com/a/5971248/8443371
https://stackoverflow.com/a/5970314/8443371

rgommers · 2024-04-21T18:36:45Z

Concerning the license, the code contained in the referred link is also contained in Stackoverlow. Does it solve the problem?

Unfortunately not. From http://scipy.github.io/devdocs/dev/core-dev/index.html#licensing: "For instance, code published on StackOverflow is covered by a CC-BY-SA license, which is not compatible due to the share-alike clause. These contributions cannot be accepted for inclusion in SciPy unless the original code author is willing to (re)license his/her code under the modified BSD (or compatible) license"

ggkogan · 2024-04-21T19:28:16Z

t the original aut

I think that I found the original and it seems ok. Please approve @rgommers

rgommers · 2024-04-21T19:43:54Z

I think that I found the original and it seems ok. Please approve @rgommers

Nice digging. Agreed, that looks like the original and states clearly enough that it is MIT-licensed. So all good from that perspective.

ggkogan · 2024-04-23T13:25:01Z

I had a quick look at the code, and the heavy use of templating and fused types jumped out to me. It results in a too-large size of the new extension module, it's immediately almost the largest one in the ndimage module:
% # on macOS arm64:
% ls -lh build/scipy/ndimage/*.so | awk '{print $5, $9}'
51K build/scipy/ndimage/_ctest.cpython-311-darwin.so
105K build/scipy/ndimage/_cytest.cpython-311-darwin.so
161K build/scipy/ndimage/_nd_image.cpython-311-darwin.so
372K build/scipy/ndimage/_ni_label.cpython-311-darwin.so
345K build/scipy/ndimage/_rank_filter_1d.cpython-311-darwin.so
Could this be limited to a subset of types? What does _nd_image.rank_filter support, and why would it be necessary to extend that already-supported set?

I have reviewed this comment. In case I use the same Cython file, eliminating all the possible overhead by defining all variables' type in the file (all the functions) and defining the template in the cpp file as a function. In such a case, the size reduces to 190K. Currently, I reduced the usage of the fused types to the main types only (int64/float32/double), the rest of the types are casted to those types (leading to some instability in performance but keeping solid improvement over the current implementation).
On the other side, if I define all the variations of the function input types within the cpp file and use ctypes to import them, the .so file size is 40K.
Therefore, I suspect that Cython functionality as a glue, takes most of the size, regardless of the typing. I looked for other pyx files, within the submodule, which are used as a glue for cpp/c files considering using them and try to reduce the overhead. Unfortunately, I did not find any.

Two questions:

Do you think I am missing something?
Will using a direct API such as Numpy's solve it?

Also, I know that you are pretty busy and those are basic questions. Anyone else that can support me here or should I go solo/stackoverflow? Maybe I should be using other platform/try to involve other people in this PR?

ev-br · 2024-04-24T09:28:07Z

hose are basic questions.

They aren't. Related things are under rather active discussion (see #20334 (comment) for a sample), and the jury is still out on what's the recommended way to wrap C/C++ kernels going forward. Up until very recently the standing recommendation was cython, just like you did. Maybe you are the perfect person to move this forward?

So I think you've at least two options, and none of them pretty :-).

people from the C++ land like pybind11. I've no experience with it, personally, and it would be interesting to see if it produces smaller binaries. It may or may not, we'll need to see and compare.
the (majority of the) rest of ndimage constructs the extensions manually, https://github.com/scipy/scipy/blob/main/scipy/ndimage/src/nd_image.c

rgommers · 2024-04-24T11:30:43Z

Currently, I reduced the usage of the fused types to the main types only (int64/float32/double), the rest of the types are casted to those types (leading to some instability in performance but keeping solid improvement over the current implementation).

This is the thing we do pretty much everywhere, and is pretty standard and necessary. Cython adds overhead indeed, but not templating over a large set of types in C++ is .

I tested again, on Linux x86-64 this time:

$ # on Linux x86-64
$ python dev.py build -C-Dbuildtype=release
$ ls -lh build/scipy/ndimage/*.so | awk '{print $5, $9}'17K build/scipy/ndimage/_ctest.cpython-311-x86_64-linux-gnu.so
84K build/scipy/ndimage/_cytest.cpython-311-x86_64-linux-gnu.so
135K build/scipy/ndimage/_nd_image.cpython-311-x86_64-linux-gnu.so
394K build/scipy/ndimage/_ni_label.cpython-311-x86_64-linux-gnu.so
263K build/scipy/ndimage/_rank_filter_1d.cpython-311-x86_64-linux-gnu.so

Turning off release mode (our default python dev.py build):

1,9M build/scipy/ndimage/_ni_label.cpython-311-x86_64-linux-gnu.so
1,2M build/scipy/ndimage/_rank_filter_1d.cpython-311-x86_64-linux-gnu.so

263 kb is still surprisingly large after removal of most of the fused types usage. I'm actually a little puzzled by what is happening there. I did a quick test doubling the number of types under ctypedef fused numeric_t:, and it only increases the size by 31 kb (~12%). Reducing it to only double reduced the size to 241 kb, so only a ~10% reduction.

Changing def statements to cdef already helps more, it's down to 209 kb for only cdef rank_filter_1d_cpp_api and down to 161 kb for cdef rank_filter_1d. So it's probably useful to do all the input validation and selection of the correct type to call in a pure Python file, and only use 1-D memoryview in the Cython code rather than using numpy functions through the Python API. There really isn't much of a speed gain of doing that kind of thing in Cython.

Making rank_filter_1d_cpp_api work on a memoryview also avoids having it return a Python object (out_arr), and it can then be annotated noexcept nogil. cython -a may help show where Python is used in the .pyx file (see docs).

pybind11 or using the C API directly could be a nice comparison, but it looks to me like optimizing the Cython code first should be the next step. The C++ code here is clean/short enough that this should be doable in Cython I'd hope.

ggkogan · 2024-04-30T14:33:06Z

Hi,
I have implemented the linkage via the Numpy-C API. From my tests, there is no other way providing the same speed and file size:

I have tested all the directions suggested to Cython and it is a dead-end with respect to file size.
An experienced user of pybind11 indicated it creates smaller files but it is slower.

A direct (GCC with flags) compilation produces smaller .so than the compilation via meason.build. I assume it is related to some interference by Cython (I have seen that Cython appears in .so file name therefore I assume we are using it as the compilation engine).
Currently, the file size for the main three datatypes (float32, double, int64) is 23K for a compilation flag -O1 and 27K for the package default compilation. If I am using a direct compilation there is no file size variation. I assume that the addition in the file size is another Cython artifact - as far as I see, no boost in performance is seen.
I prefer to use the default package compilation, avoiding any surprises or additional maintenance in the future.

PS: any comments about the code (or something else) are more than welcome 🙏 @ev-br @rgommers or anyone else who reads it

rgommers

Thanks for the detailed experiments and explanation @ggkogan! Overall your conclusions sounds about right to me.

An experienced user of pybind11 indicated it creates smaller files but it is slower.

I think they meant function call overhead. Which is true indeed - but only relevant for small arrays, for large arrays (which is what we should mostly care about) that doesn't really matter.

Looks like this is down to 21 kb for a release build on Linux (and 87 kb with the python dev.py build defaults):

$ ls -lh build/scipy/ndimage/_rank_filter_1d.cpython-311-x86_64-linux-gnu.so
-rwxr-xr-x 1 rgommers rgommers 21K  3 mei 14:44 build/scipy/ndimage/_rank_filter_1d.cpython-311-x86_64-linux-gnu.so

That's ~12x smaller than the Cython version, which is great.

(I have seen that Cython appears in .so file name therefore I assume we are using it as the compilation engine

Not Cython, but CPython: cpython-311-x86_64-linux-gnu.so. The file extension contains info relevant to the ABI used.

I assume that the addition in the file size is another Cython artifact - as far as I see, no boost in performance is seen.

No, just depends on the optimization level etc. Release builds are -O3, dev.py builds are -O2 -g.

rgommers · 2024-05-03T12:46:31Z

scipy/ndimage/meson.build

+py3.extension_module('_rank_filter_1d',
+  'src/_rank_filter_1d.cpp',
+  install: true,
+  dependencies: np_dep,


Missing a line here that is present for all other extension modules:

link_args: version_link_args,

could you please add that?

rgommers · 2024-05-03T12:54:53Z

scipy/ndimage/_filters.py

-                              origins)
+        if input.ndim == 1:
+            rank = int(rank)
+            origin = int(origin)


These should already be integers. Why was this needed?

My assumption was similar to yours but I have added it after failing some unit-test. This was back when I used Cython as a glue and it marked that my input is np.int64 rather than int. I did not want to look for the origin of the problem as I did not want to modify anything unrelated to this pull request. I hoped someone would pay attention and it would be sufficiently visible to be corrected in future.
Anyway, I do not see it now (I assume that Numpy-C API is not sensitive to it) and therefore I removed the casting.

rgommers · 2024-05-03T12:59:10Z

scipy/ndimage/_filters.py

+            if mode == 6:
+                mode = 4
+            if mode == 5:
+                mode = 1


Correct indeed, as explained in the user guide: http://scipy.github.io/devdocs/tutorial/ndimage.html. It looks a bit out of place to do this here though. Not sure why it could not be done inside _extend_mode_to_code if the modes are actually identical.

rgommers · 2024-05-03T13:00:46Z

scipy/ndimage/_filters.py

+                mode = 4
+            if mode == 5:
+                mode = 1
+            casting_cond = input.dtype.name not in ['int64', 'float64', 'float32']


Prefer using the actual dtypes: input.dtype not in (np.int64, np.float64, np.float32)

rgommers · 2024-05-03T13:01:42Z

scipy/ndimage/_filters.py

+            casting_cond = input.dtype.name not in ['int64', 'float64', 'float32']
+            if casting_cond:
+                x = input.astype('int64')
+                x_out = np.empty_like(x)


It looks to me like this should be using np.result_type to determine what the correct output type is. E.g., for other floating-point dtypes, float32 or float64 will be expected, not int64.

rgommers · 2024-05-03T13:03:23Z

scipy/ndimage/src/_rank_filter_1d.cpp

+I optimized by restriction of cases and proper initialization,
+also adapted for rank filter rather than the original median filter.
+Allowed different boundary conditions.
+Moved to C++ for polymorphism and added C-Numpy API.


Could you remove the comments on evolution of the code, and copy exactly the copyright comment that was in the original file?

is it ok with you? I would not insist on the second line but some major changes have been made in did.

//Copyright (c) 2011 ashelly.myopenid.com under http://www.opensource.org/licenses/mit-license
//Modified in 2024 by Gideon Genadi Kogan

Yes, that is fine with me.

rgommers · 2024-05-03T13:05:03Z

scipy/ndimage/src/_rank_filter_1d.cpp

+Allowed different boundary conditions.
+Moved to C++ for polymorphism and added C-Numpy API.
+*/
+


Style comment: could you please run clang-format over this file?

rgommers · 2024-05-03T13:08:29Z

scipy/ndimage/src/_rank_filter_1d.cpp

+//creates new Mediator: to calculate `nItems` running rank.
+Mediator* MediatorNew(int nItems, int rank)
+{
+   Mediator* m =  (Mediator*)malloc(sizeof(Mediator));


It would be good to use new instead of malloc, since this is C++ code now.

gideonKogan · 2024-05-04T16:50:50Z

No, just depends on the optimization level etc. Release builds are -O3, dev.py builds are -O2 -g.

for manual compilation, I did not see a file size change...

gideonKogan · 2024-05-05T07:14:04Z

@rgommers I have made all the required modifications, hopefully fitting the expectations :)

ggkogan requested a review from rgommers as a code owner April 20, 2024 22:40

github-actions bot added scipy.ndimage C/C++ Items related to the internal C/C++ code base Cython Issues with the internal Cython code base Meson Items related to the introduction of Meson as the new build system for SciPy enhancement A new feature or improvement labels Apr 20, 2024

ggkogan mentioned this pull request Apr 20, 2024

ENH: ndimage: 1D rank filter speed up #20026

Open

ggkogan marked this pull request as draft April 21, 2024 06:50

ggkogan marked this pull request as ready for review April 21, 2024 07:01

ggkogan marked this pull request as draft April 21, 2024 08:10

rgommers reviewed Apr 21, 2024

View reviewed changes

scipy/ndimage/meson.build Outdated Show resolved Hide resolved

rgommers reviewed Apr 21, 2024

View reviewed changes

scipy/ndimage/_rank_filter_1d.pyx Outdated Show resolved Hide resolved

lucascolley changed the title ~~ENH: rank filter for 1D cases log(n) complexity implementation~~ ENH: ndimage.rank_filter: log(n) implementation for 1D cases Apr 21, 2024

lucascolley changed the title ~~ENH: ndimage.rank_filter: log(n) implementation for 1D cases~~ ENH: ndimage: log(n) implementation for 1D rank filter Apr 21, 2024

ggkogan force-pushed the enh-rank-filt-1d branch from cd87fc7 to 3cb7397 Compare April 21, 2024 13:17

ggkogan requested a review from rgommers April 21, 2024 13:53

ggkogan marked this pull request as ready for review April 21, 2024 14:12

ggkogan force-pushed the enh-rank-filt-1d branch 2 times, most recently from 7729ee0 to 8651e0b Compare April 30, 2024 15:20

ENH: rank filter for 1D cases log(n) complexity implementation

79bbe05

ggkogan force-pushed the enh-rank-filt-1d branch from a38a472 to 79bbe05 Compare May 2, 2024 11:42

rgommers reviewed May 3, 2024

View reviewed changes

first iteration of code review

99ffa7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: ndimage: log(n) implementation for 1D rank filter #20543

ENH: ndimage: log(n) implementation for 1D rank filter #20543

ggkogan commented Apr 20, 2024 •

edited by lucascolley

ggkogan commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024 •

edited

rgommers commented Apr 21, 2024

rgommers commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024

ggkogan commented Apr 21, 2024

ggkogan commented Apr 21, 2024 •

edited by rgommers

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 23, 2024 •

edited by rgommers

ev-br commented Apr 24, 2024

rgommers commented Apr 24, 2024

ggkogan commented Apr 30, 2024 •

edited

rgommers left a comment

rgommers May 3, 2024

rgommers May 3, 2024

gideonKogan May 4, 2024 •

edited

rgommers May 3, 2024

rgommers May 3, 2024

rgommers May 3, 2024

rgommers May 3, 2024

gideonKogan May 4, 2024

rgommers May 4, 2024

rgommers May 3, 2024

rgommers May 3, 2024

gideonKogan commented May 4, 2024

gideonKogan commented May 5, 2024 •

edited

ENH: ndimage: log(n) implementation for 1D rank filter #20543

Are you sure you want to change the base?

ENH: ndimage: log(n) implementation for 1D rank filter #20543

Conversation

ggkogan commented Apr 20, 2024 • edited by lucascolley

Reference issue

What does this implement/fix?

ggkogan commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024 • edited

rgommers commented Apr 21, 2024

rgommers commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024

ggkogan commented Apr 21, 2024

ggkogan commented Apr 21, 2024 • edited by rgommers

rgommers commented Apr 21, 2024

ggkogan commented Apr 21, 2024

rgommers commented Apr 21, 2024

ggkogan commented Apr 23, 2024 • edited by rgommers

ev-br commented Apr 24, 2024

rgommers commented Apr 24, 2024

ggkogan commented Apr 30, 2024 • edited

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gideonKogan May 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gideonKogan commented May 4, 2024

gideonKogan commented May 5, 2024 • edited

ggkogan commented Apr 20, 2024 •

edited by lucascolley

ggkogan commented Apr 21, 2024 •

edited

ggkogan commented Apr 21, 2024 •

edited by rgommers

ggkogan commented Apr 23, 2024 •

edited by rgommers

ggkogan commented Apr 30, 2024 •

edited

gideonKogan May 4, 2024 •

edited

gideonKogan commented May 5, 2024 •

edited