Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926) #19955

Merged
merged 1 commit into from Sep 26, 2021

Commits on Sep 25, 2021

  1. BUG: Resolve Divide by Zero on Apple silicon + test failures (numpy#1…

    …9926)
    
    * Resolve divide by zero in reciprocal numpy#18555
    
    clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.
    
    The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.
    
    `volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.
    
    Testing:
    Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.
    
    Before:
    ```
    FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
    FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
    FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
    FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
    FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
    FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
    FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
    FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
    8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
    ```
    
    After:
    ```
    FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
    1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
    ```
    
    * Enhancement on top of workaround for clang bug in reciprocal
    
    Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
    Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.
    
    The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
    
    * Address Azure CI failures with older versions of clang
    
    - -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
    - Apple Clang and Clang have different, but overlapping versions
    - Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64
    
    Testing was done against Apple Clang versions
    - v11 / x86_64 - failed previously, now passes (azure failure)
    - v12+ / x86_64 - passes before and after
    - v13 / arm64 - failed before initial patch, passes after
    Developer-Ecosystem-Engineering authored and charris committed Sep 25, 2021
    Copy the full SHA
    c91443b View commit details
    Browse the repository at this point in the history