BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

charris · 2021-09-25T22:59:03Z

Resolve divide by zero in reciprocal RuntimeWarning: divide by zero encountered in reciprocal for np.ones(3) ** -1 #18555

clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's reciprocal function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.

The changes here insert a dummy volatile variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.

volatile can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply volatile. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the reciprocal function. Moreover, volatile is only needed when a vector is partially loaded.

Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.

Before:

FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)

After:

FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)

Enhancement on top of workaround for clang bug in reciprocal

Enhancement on top of workaround for clang bug in reciprocal (#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.

Address Azure CI failures with older versions of clang

-ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
Apple Clang and Clang have different, but overlapping versions
Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64

Testing was done against Apple Clang versions

v11 / x86_64 - failed previously, now passes (azure failure)
v12+ / x86_64 - passes before and after
v13 / arm64 - failed before initial patch, passes after

…9926) * Resolve divide by zero in reciprocal numpy#18555 clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's. The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's. `volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded. Testing: Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite. Before: ``` FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal 8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09) ``` After: ``` FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan 1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10) ``` * Enhancement on top of workaround for clang bug in reciprocal Enhancement on top of workaround for clang bug in reciprocal (numpy#18555) Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector. The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop. * Address Azure CI failures with older versions of clang - -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64 - Apple Clang and Clang have different, but overlapping versions - Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64 Testing was done against Apple Clang versions - v11 / x86_64 - failed previously, now passes (azure failure) - v12+ / x86_64 - passes before and after - v13 / arm64 - failed before initial patch, passes after

github-actions bot added the 00 - Bug label Sep 25, 2021

charris added 08 - Backport Used to tag backport PRs component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Sep 25, 2021

charris added this to the 1.21.3 release milestone Sep 25, 2021

charris merged commit 13c259e into numpy:maintenance/1.21.x Sep 26, 2021

charris deleted the backport-19926 branch September 26, 2021 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

charris commented Sep 25, 2021

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

Conversation

charris commented Sep 25, 2021