Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enhancement on top of workaround for clang bug in reciprocal
Enhancement on top of workaround for clang bug in reciprocal (numpy#18555) Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector. The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
- Loading branch information