BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

Developer-Ecosystem-Engineering · 2021-09-22T23:13:53Z

Clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's reciprocal function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy volatile variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

volatile can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply volatile. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the reciprocal function. Moreover, volatile is only needed when a vector is partially loaded.

Testing:
Fixes #18555 and the changes here also resolve several failures in numpy's test suite.

Before:

FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)

After:

FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)

After addressing this failure, we've added to it (second commit requires the first)

Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's. The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's. `volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded. Testing: Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite. Before: ``` FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal 8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09) ``` After: ``` FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan 1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10) ```

Enhancement on top of workaround for clang bug in reciprocal (numpy#18555) Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector. The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

Developer-Ecosystem-Engineering · 2021-09-23T03:07:00Z

Not seeing this Azure failure locally, will look into the differences.

It appears to be due to CI using Apple clang version 11.0.0, we will likely need to expand the scope of the workaround for CI to pass, -ftrapping-math is only available on newer clangs for x86_64

seberg · 2021-09-23T15:59:00Z

I am wondering how much effort we should spend on buggy clang (or MacOS math libraries) here, especially if it only affects older versions... Most likely there are far more FPE failures on MacOS anyway and the only reason CI passes is that we don't have systematic unit tests for most functions.

rgommers · 2021-09-23T16:36:47Z

I am wondering how much effort we should spend on buggy clang (or MacOS math libraries) here, especially if it only affects older versions

@seberg this remainder issue is important, it's causing a lot of SciPy failures on macOS arm64 and it is the only such issue left. IIRC there were 121 SciPy failures on master last time I checked, and >110 were caused by this warning.

Most likely there are far more FPE failures on MacOS anyway and the only reason CI passes is that we don't have systematic unit tests for most functions.

Other than one SciPy kernel panic related to ARPACK, we're in pretty good shape for macOS arm64. The NumPy test suite passes in release mode.

rgommers · 2021-09-23T16:42:20Z

If we can upgrade Clang to the minimum required version in CI that would be fine too. Conda Clang compilers are at 11.1 though, so not much higher than 11.0 as in CI here. If we can keep 11.1 as a base without a large amount of extra effort rather than bumping to 12.0, that would be nice.

rgommers · 2021-09-23T19:52:40Z

this remainder issue is important, it's causing a lot of SciPy failures on macOS arm64 and it is the only such issue left. IIRC there were 121 SciPy failures on master last time I checked, and >110 were caused by this warning.

Actually that isn't completely correct. With SciPy master + numpy 1.21.1 there are now 101 failures. About 80 of those are BLAS/LAPACK related. With this PR that's down to 89 test failures, so it resolves about half of the non-linalg failures. What's left is similar in nature:

FAILED lib/python3.9/site-packages/scipy/signal/tests/test_ltisys.py::TestPlacePoles::test_complex - RuntimeWarning: divide by zero encountered in det
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_discrete_distns.py::TestNCH::test_nchypergeom_wallenius_naive - RuntimeWarning: invalid val...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_cont_basic[500-200-beta-arg4] - RuntimeWarning: overflow encounte...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_distributions.py::TestFrozen::test_pickling - RuntimeWarning: overflow encountered in _beta...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_distributions.py::TestExpect::test_beta - RuntimeWarning: overflow encountered in _beta_ppf
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_cont_basic[500-200-rdist-arg85] - RuntimeWarning: overflow encoun...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_cont_basic[500-200-semicircular-arg89] - RuntimeWarning: overflow...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_stats.py::TestNumericalInverseHermite::test_basic[rdist-shapes86] - RuntimeWarning: overflo...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_stats.py::TestNumericalInverseHermite::test_basic[semicircular-shapes90] - RuntimeWarning: ...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_stats.py::TestNumericalInverseHermite::test_inaccurate_CDF - RuntimeWarning: overflow encou...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[beta-args4-ppf] - RuntimeWarning: overflow enc...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[beta-args4-isf] - RuntimeWarning: overflow enc...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[rdist-args86-ppf] - RuntimeWarning: overflow e...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[rdist-args86-isf] - RuntimeWarning: overflow e...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[semicircular-args90-ppf] - RuntimeWarning: ove...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_continuous_basic.py::test_methods_with_lists[semicircular-args90-isf] - RuntimeWarning: ove...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_discrete_basic.py::test_discrete_basic[nbinom-arg13-False] - RuntimeWarning: overflow encou...
FAILED lib/python3.9/site-packages/scipy/stats/tests/test_discrete_basic.py::test_moments[nbinom-arg13] - RuntimeWarning: overflow encountered in _nb...
=

- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64 - Apple Clang and Clang have different, but overlapping versions - Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64 Testing was done against Apple Clang versions - v11 / x86_64 - failed previously, now passes (azure failure) - v12+ / x86_64 - passes before and after - v13 / arm64 - failed before initial patch, passes after

Developer-Ecosystem-Engineering · 2021-09-24T17:29:33Z

Latest commit should resolve Azure failures now as well, waiting for CI to finish to confirm.

charris · 2021-09-25T22:49:02Z

Thanks @Developer-Ecosystem-Engineering .

…9926) * Resolve divide by zero in reciprocal numpy#18555 clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's. The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's. `volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded. Testing: Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite. Before: ``` FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal 8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09) ``` After: ``` FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan 1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10) ``` * Enhancement on top of workaround for clang bug in reciprocal Enhancement on top of workaround for clang bug in reciprocal (numpy#18555) Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector. The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop. * Address Azure CI failures with older versions of clang - -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64 - Apple Clang and Clang have different, but overlapping versions - Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64 Testing was done against Apple Clang versions - v11 / x86_64 - failed previously, now passes (azure failure) - v12+ / x86_64 - passes before and after - v13 / arm64 - failed before initial patch, passes after

isuruf · 2021-09-25T22:54:00Z

Conda Clang compilers are at 11.1 though, so not much higher than 11.0 as in CI here.

FYI, Conda clang 11.1 is equivalent to Apple clang 12. See the versioning at https://en.wikipedia.org/wiki/Xcode

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926)

…9926) * Resolve divide by zero in reciprocal numpy#18555 clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's. The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's. `volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded. Testing: Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite. Before: ``` FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal 8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09) ``` After: ``` FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan 1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10) ``` * Enhancement on top of workaround for clang bug in reciprocal Enhancement on top of workaround for clang bug in reciprocal (numpy#18555) Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector. The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop. * Address Azure CI failures with older versions of clang - -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64 - Apple Clang and Clang have different, but overlapping versions - Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64 Testing was done against Apple Clang versions - v11 / x86_64 - failed previously, now passes (azure failure) - v12+ / x86_64 - passes before and after - v13 / arm64 - failed before initial patch, passes after

- test_sincos_float32 taken care of in numpy/numpy#20274 - TestF77Mismatch removed upstream in numpy/numpy#20457 - test_unary_PyUFunc_O_O_method_full fixed in 1.22 (numpy/numpy#19926)

Developer-Ecosystem-Engineering added 2 commits September 22, 2021 16:07

github-actions bot added the 00 - Bug label Sep 22, 2021

seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 22, 2021

charris added the 09 - Backport-Candidate PRs tagged should be backported label Sep 23, 2021

rgommers added this to the 1.22.0 release milestone Sep 23, 2021

charris merged commit 0b01b48 into numpy:main Sep 25, 2021

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 25, 2021

charris added a commit that referenced this pull request Sep 26, 2021

Merge pull request #19955 from charris/backport-19926

13c259e

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926)

Developer-Ecosystem-Engineering mentioned this pull request Sep 27, 2021

BUG: np.tan(np.inf) test failure in Apple silicon #19974

Merged

ogrisel mentioned this pull request Oct 22, 2021

RuntimeWarning "invalid value encountered in reciprocal" when running tests on macOS scikit-learn/scikit-learn#21395

Closed

h-vetinari mentioned this pull request Nov 2, 2021

numpy 1.20 on MacOSX: spurious RuntimeWarning: invalid value encountered in reciprocal conda-forge/numpy-feedstock#229

Closed

1 task

seberg mentioned this pull request Dec 21, 2021

BUG: missing exception in test_unary_PyUFunc_O_O_method_full[reciprocal] on osx with 1.21.5 #20637

Closed

This was referenced Dec 25, 2021

ENH: BLD: enable building SciPy with Meson scipy/scipy#14847

Merged

BUG: stats: distribution methods emit unnecessary warnings from Boost scipy/scipy#14901

Closed

jakirkham mentioned this pull request Jan 11, 2022

Random CI failure on MacOS: RuntimeWarning: invalid value encountered in reciprocal dask/dask#7189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

Developer-Ecosystem-Engineering commented Sep 22, 2021 •

edited

Developer-Ecosystem-Engineering commented Sep 23, 2021 •

edited

seberg commented Sep 23, 2021

rgommers commented Sep 23, 2021

rgommers commented Sep 23, 2021

rgommers commented Sep 23, 2021

Developer-Ecosystem-Engineering commented Sep 24, 2021

charris commented Sep 25, 2021

isuruf commented Sep 25, 2021

BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

BUG: Resolve Divide by Zero on Apple silicon + test failures #19926

Conversation

Developer-Ecosystem-Engineering commented Sep 22, 2021 • edited

Developer-Ecosystem-Engineering commented Sep 23, 2021 • edited

seberg commented Sep 23, 2021

rgommers commented Sep 23, 2021

rgommers commented Sep 23, 2021

rgommers commented Sep 23, 2021

Developer-Ecosystem-Engineering commented Sep 24, 2021

charris commented Sep 25, 2021

isuruf commented Sep 25, 2021

Developer-Ecosystem-Engineering commented Sep 22, 2021 •

edited

Developer-Ecosystem-Engineering commented Sep 23, 2021 •

edited