Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.mod ~3x slower in numpy 1.20 #18607

Closed
sevas opened this issue Mar 12, 2021 · 8 comments
Closed

np.mod ~3x slower in numpy 1.20 #18607

sevas opened this issue Mar 12, 2021 · 8 comments

Comments

@sevas
Copy link

sevas commented Mar 12, 2021

Using np.mod with numpy 1.20 is slower than with 1.19 or 1.18.

I am using a mambaforge environment with numpy packages from conda-forge on Windows. In all cases I have the same mkl backend (version 2020.4-hb70f87d_311)

Reproducing code example:

timings with numpy 1.19.5

Version info: 1.19.5 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 04:30:06) [MSC v.1916 64 bit (AMD64)]

In [1]: import numpy as np
In [2]: a = np.random.random((640,480))
In [3]: b = np.random.random((1280,960))
In [4]: c = np.random.random((320,240))
In [5]: %timeit np.mod(a, np.pi)
2.67 ms ± 66.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit np.mod(b, np.pi)
10.8 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit np.mod(c, np.pi)
466 µs ± 6.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timings with numpy 1.20.1

Version info : 1.20.1 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 04:30:06) [MSC v.1916 64 bit (AMD64)]

In [2]: import numpy as np
In [3]: a = np.random.random((640,480))
In [4]: b = np.random.random((1280,960))
In [5]: c = np.random.random((320,240))
In [6]: %timeit np.mod(a, np.pi)
8.69 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit np.mod(b, np.pi)
34.6 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit np.mod(c, np.pi)
1.9 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I get similar timings with numpy 1.20.0.

@charris
Copy link
Member

charris commented Mar 13, 2021

Possibly related to #16161.

@charris
Copy link
Member

charris commented Mar 13, 2021

@anirudh2290 Thoughts?

@seberg seberg added this to the 1.20.2 release milestone Mar 13, 2021
@seberg
Copy link
Member

seberg commented Mar 13, 2021

Milestoneing it for now, this seems like it is probably higher on the priority list. Part of that PR was because of clang problems I think? And that part might just have been that missing -ffpe-error-strict (gh-18005), although some more of it has to be reviewed and we might just have thought wrong about the warning flags to begin with.

@anirudh2290
Copy link
Member

Apologies for the delay in reply. There is a slowdown in 1.20.1 because of this PR #16161 (because of the additional is_nan and is_inf checks) which is around 30%, and I think this is expected because of the additional nan and inf checks. Doing some git bisect, I also see some additional slowdown because of #17596 which seems to be around 40% (tested with the first timeit mentioned above).

Regarding the flags, the main motivation for the PR was divmod(1.0, 0.0) and floor_divide(1.0, 0.0) not raising a dividebyzero error. So this PR forced the invalid and divide by zero flags for these operations.

One thing that hasn't been clearly called out in the PR is the behavior with nan's . While adding the tests for nans with divmod and friends, I realized that the nan behavior is not same across compiler versions. So the behavior was made same across the compiler versions, to keep the behavior same as the later compilers, for example: np.divmod(np.nan, 1) sets invalid flag across all compiler versions not just gcc-8 and later. (I know this is different from what is discussed here : #16161 (comment) , but I remember having some discussion and coming to this conclusion offline (maybe i am remembering wrongly :) ).

One option we may have right now, if users compiling with earlier gcc versions are running into issues because of behavior difference, is to remove the nan checks and resort to the compiler behavior. WDYT ?

@charris
Copy link
Member

charris commented May 6, 2021

Pushing this off to 1.22.0 for tracking. There seems to be some discussion about the modifications that led to this slowdown.

@seberg
Copy link
Member

seberg commented Jul 19, 2021

This should have been improved by gh-19316, although I am not certain how much. I think it is probably not yet quite back to the 1.19.x speed.

In any case, I am going to close the issue. If it is still much slower, please re-open or comment. For now, I will consider any big remaining slow-down a failure to optimize correctly in MSVC.

Looking at it in godbolt, it seems to me that MSVC fails to inline all math is... functions (isinf, isnan, isless, ...). And we added a few unnecessary calls – which are now removed –, but the fix also makes use of isless, to ensure correct floating point error behaviour. So it should have improved a lot, but likely not to pre 1.20 levels.
(Unless MSVC relies on global, link-time, optimization to optimize this correctly. But conda-forge uses that as far as I know...)

@seberg seberg closed this as completed Jul 19, 2021
@seberg
Copy link
Member

seberg commented Jul 19, 2021

A reference in case anyone wants to follow-up with MSVC (I am not willing to create an account just to report this): https://developercommunity.visualstudio.com/t/std::isnan-missed-optimization/417076?entry=problem&space=62&q=isnan

It seems to me that they fixed it, but only for C++, and not for C?! Even on C++, I think they really fixed isnan specifically instead of getting all of the is... functions right in one go (they are all exceedingly similar and usually compile to exactly the same instructions – albeit with some differences when it comes to SSE maybe).

@sevas
Copy link
Author

sevas commented Jul 20, 2021

@seberg alright, thanks a lot for the updates. I'll see to make a visual studio bugreport with the info you gathered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants