Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeWarning: divide by zero encountered in reciprocal for np.ones(3) ** -1 #18555

Closed
ogrisel opened this issue Mar 5, 2021 · 11 comments · Fixed by #19926
Closed

RuntimeWarning: divide by zero encountered in reciprocal for np.ones(3) ** -1 #18555

ogrisel opened this issue Mar 5, 2021 · 11 comments · Fixed by #19926

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Mar 5, 2021

Reproducing code example:

import numpy as np

with np.errstate(invalid="raise"):
    np.ones(3) ** -1 

raises:

<ipython-input-43-02fd0a095546>:2: RuntimeWarning: divide by zero encountered in reciprocal
  np.ones(3) ** -1

but the result of the evaluation is still correct: array([1., 1., 1.])...

I also tried 1 / np.ones(3), np.ones(2) ** -1, np.ones(4) ** -1 which do not raise the warning.

I get it for any numpy array of 3 items with dtype np.float32 or np.float64 but not np.complex64 for instance.

NumPy/Python version information:

I got the problem with:

1.20.1 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:00:30) 
[Clang 11.0.1 ]

on macos/arm64.

I cannot get it from a linux/aarch64 docker environment running on the same machine numpy installed from PyPI.org with version:

1.20.1 3.9.1 (default, Dec 12 2020, 08:46:48) 
[GCC 8.3.0]

I cannot get it from a linux/aarch64 docker environment running on the same machine numpy installed from conda-forge with version:

1.20.1 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:26) 
[GCC 9.3.0]

So maybe this is a bug related to clang or macos/arm64? Could someone with an intel mac try to reproduce it, preferably using the numpy from conda-forge?

@seberg
Copy link
Member

seberg commented Mar 5, 2021

@ogrisel I think we might be running into #18005, i.e. clang has the terrible default of not using -ffp-exception-behavior=strict! There are already a few places in the code working around that I think. Try setting the flag if that fixes it. We really should force that flag, but I am not well versed with distutils or clang so was never sure where to add it when I looked at it once.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 5, 2021

I built numpy master with:

CFLAGS="-ffp-exception-behavior=strict" python setup.py install &> /tmp/numpy_build_log.txt

and I still get the RuntimeWarning.

Here is the build log:

https://gist.github.com/ogrisel/d104c8cf1b1c13202c204f3cc0004e38

@seberg
Copy link
Member

seberg commented Mar 5, 2021

Hmm, thought it was worth a shot, since apparently without the flag clang feels free to do optimizations that can make the floating point exceptions all wrong.
Still seems likely to be a bug with clang or glibc on macos/arm64, but I am not certain.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 7, 2021

Not a big problem because it's just a warning and the result is correct. Still it's annoying in for libraries that checks that that warnings are not raised in a given context.

@isuruf
Copy link
Contributor

isuruf commented Mar 8, 2021

The fix should be the same as in #18571 for reciprocal

@erykoff
Copy link

erykoff commented Mar 8, 2021

Interesting. Is this also a problem with the latest Xcode compiler or just the conda compiler (which is a version behind). I believe the problem in #17712 was not present with the latest Xcode compiler (clang 12) which presumably fixed that look-ahead exception bug.

@isuruf
Copy link
Contributor

isuruf commented Mar 8, 2021

Is this also a problem with the latest Xcode compiler

Yes.

which is a version behind

Not really. conda compiler is actually ahead. The versioning schemes are different though.

@isuruf
Copy link
Contributor

isuruf commented Apr 28, 2021

This is not the same issue as #18571. It turns out that this is due to branch prediction going wrong in the for loop and floating point exceptions being raised for the wrong branch.
np.ones(120)[:120] ** -1 gives the warning, but np.ones(120)[:119] ** -1 doesn't.

@ogrisel
Copy link
Contributor Author

ogrisel commented Apr 28, 2021

Shall we report the issue upstream to LLVM (I assume) and just mark the failing TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] test (see #18143 (comment)) as XFAIL?

@isuruf
Copy link
Contributor

isuruf commented Apr 29, 2021

I don't have a small testcase to report this issue upstream. Besides I'm not even sure if it's a compiler issue or a hardware bug.

@charris
Copy link
Member

charris commented May 6, 2021

Going to push this off to 1.22.0 for tracking. It seems the problem is not directly connected to NumPy.

@charris charris modified the milestones: 1.20.3 release, 1.22.0 release May 6, 2021
ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Sep 16, 2021
The code in this test can raise:

  RuntimeWarning: divide by zero encountered in reciprocal

but this warning should not be raised. See the upstream report:
numpy/numpy#18555

In the mean time, let's ignore those warnings in this test to
make it pass and avoid crashing the scikit-learn tests on those
systems.
Developer-Ecosystem-Engineering added a commit to Developer-Ecosystem-Engineering/numpy that referenced this issue Sep 22, 2021
clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.

Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.

Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```

After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```
Developer-Ecosystem-Engineering added a commit to Developer-Ecosystem-Engineering/numpy that referenced this issue Sep 22, 2021
Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
charris pushed a commit that referenced this issue Sep 25, 2021
* Resolve divide by zero in reciprocal #18555

clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.

Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.

Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```

After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```

* Enhancement on top of workaround for clang bug in reciprocal

Enhancement on top of workaround for clang bug in reciprocal (#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

* Address Azure CI failures with older versions of clang

- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64

Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after
charris pushed a commit to charris/numpy that referenced this issue Sep 25, 2021
…9926)

* Resolve divide by zero in reciprocal numpy#18555

clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.

Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.

Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```

After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```

* Enhancement on top of workaround for clang bug in reciprocal

Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

* Address Azure CI failures with older versions of clang

- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64

Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after
howjmay pushed a commit to howjmay/numpy that referenced this issue Sep 29, 2021
…9926)

* Resolve divide by zero in reciprocal numpy#18555

clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.

Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.

Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```

After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```

* Enhancement on top of workaround for clang bug in reciprocal

Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

* Address Azure CI failures with older versions of clang

- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64

Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants