np.mod ~3x slower in numpy 1.20 #18607

sevas · 2021-03-12T22:49:22Z

Using np.mod with numpy 1.20 is slower than with 1.19 or 1.18.

I am using a mambaforge environment with numpy packages from conda-forge on Windows. In all cases I have the same mkl backend (version 2020.4-hb70f87d_311)

Reproducing code example:

timings with numpy 1.19.5

Version info: 1.19.5 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 04:30:06) [MSC v.1916 64 bit (AMD64)]

In [1]: import numpy as np
In [2]: a = np.random.random((640,480))
In [3]: b = np.random.random((1280,960))
In [4]: c = np.random.random((320,240))
In [5]: %timeit np.mod(a, np.pi)
2.67 ms ± 66.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit np.mod(b, np.pi)
10.8 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit np.mod(c, np.pi)
466 µs ± 6.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timings with numpy 1.20.1

Version info : 1.20.1 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 04:30:06) [MSC v.1916 64 bit (AMD64)]

In [2]: import numpy as np
In [3]: a = np.random.random((640,480))
In [4]: b = np.random.random((1280,960))
In [5]: c = np.random.random((320,240))
In [6]: %timeit np.mod(a, np.pi)
8.69 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit np.mod(b, np.pi)
34.6 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit np.mod(c, np.pi)
1.9 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I get similar timings with numpy 1.20.0.

The text was updated successfully, but these errors were encountered:

charris · 2021-03-13T01:53:22Z

Possibly related to #16161.

charris · 2021-03-13T01:56:36Z

@anirudh2290 Thoughts?

seberg · 2021-03-13T03:06:11Z

Milestoneing it for now, this seems like it is probably higher on the priority list. Part of that PR was because of clang problems I think? And that part might just have been that missing -ffpe-error-strict (gh-18005), although some more of it has to be reviewed and we might just have thought wrong about the warning flags to begin with.

anirudh2290 · 2021-03-22T06:11:30Z

Apologies for the delay in reply. There is a slowdown in 1.20.1 because of this PR #16161 (because of the additional is_nan and is_inf checks) which is around 30%, and I think this is expected because of the additional nan and inf checks. Doing some git bisect, I also see some additional slowdown because of #17596 which seems to be around 40% (tested with the first timeit mentioned above).

Regarding the flags, the main motivation for the PR was divmod(1.0, 0.0) and floor_divide(1.0, 0.0) not raising a dividebyzero error. So this PR forced the invalid and divide by zero flags for these operations.

One thing that hasn't been clearly called out in the PR is the behavior with nan's . While adding the tests for nans with divmod and friends, I realized that the nan behavior is not same across compiler versions. So the behavior was made same across the compiler versions, to keep the behavior same as the later compilers, for example: np.divmod(np.nan, 1) sets invalid flag across all compiler versions not just gcc-8 and later. (I know this is different from what is discussed here : #16161 (comment) , but I remember having some discussion and coming to this conclusion offline (maybe i am remembering wrongly :) ).

One option we may have right now, if users compiling with earlier gcc versions are running into issues because of behavior difference, is to remove the nan checks and resort to the compiler behavior. WDYT ?

charris · 2021-05-06T21:48:15Z

Pushing this off to 1.22.0 for tracking. There seems to be some discussion about the modifications that led to this slowdown.

seberg · 2021-07-19T18:59:34Z

This should have been improved by gh-19316, although I am not certain how much. I think it is probably not yet quite back to the 1.19.x speed.

In any case, I am going to close the issue. If it is still much slower, please re-open or comment. For now, I will consider any big remaining slow-down a failure to optimize correctly in MSVC.

Looking at it in godbolt, it seems to me that MSVC fails to inline all math is... functions (isinf, isnan, isless, ...). And we added a few unnecessary calls – which are now removed –, but the fix also makes use of isless, to ensure correct floating point error behaviour. So it should have improved a lot, but likely not to pre 1.20 levels.
(Unless MSVC relies on global, link-time, optimization to optimize this correctly. But conda-forge uses that as far as I know...)

seberg · 2021-07-19T19:28:58Z

A reference in case anyone wants to follow-up with MSVC (I am not willing to create an account just to report this): https://developercommunity.visualstudio.com/t/std::isnan-missed-optimization/417076?entry=problem&space=62&q=isnan

It seems to me that they fixed it, but only for C++, and not for C?! Even on C++, I think they really fixed isnan specifically instead of getting all of the is... functions right in one go (they are all exceedingly similar and usually compile to exactly the same instructions – albeit with some differences when it comes to SSE maybe).

sevas · 2021-07-20T08:54:23Z

@seberg alright, thanks a lot for the updates. I'll see to make a visual studio bugreport with the info you gathered.

seberg added this to the 1.20.2 release milestone Mar 13, 2021

charris modified the milestones: 1.20.2 release, 1.20.3 release Mar 27, 2021

charris modified the milestones: 1.20.3 release, 1.22.0 release May 6, 2021

charris added the 06 - Regression label May 6, 2021

seberg closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

np.mod ~3x slower in numpy 1.20 #18607

np.mod ~3x slower in numpy 1.20 #18607

sevas commented Mar 12, 2021

charris commented Mar 13, 2021

charris commented Mar 13, 2021

seberg commented Mar 13, 2021

anirudh2290 commented Mar 22, 2021

charris commented May 6, 2021

seberg commented Jul 19, 2021 •

edited

seberg commented Jul 19, 2021

sevas commented Jul 20, 2021

np.mod ~3x slower in numpy 1.20 #18607

np.mod ~3x slower in numpy 1.20 #18607

Comments

sevas commented Mar 12, 2021

Reproducing code example:

timings with numpy 1.19.5

timings with numpy 1.20.1

charris commented Mar 13, 2021

charris commented Mar 13, 2021

seberg commented Mar 13, 2021

anirudh2290 commented Mar 22, 2021

charris commented May 6, 2021

seberg commented Jul 19, 2021 • edited

seberg commented Jul 19, 2021

sevas commented Jul 20, 2021

seberg commented Jul 19, 2021 •

edited