Skip to content

Commit

Permalink
Enhancement on top of workaround for clang bug in reciprocal
Browse files Browse the repository at this point in the history
Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.

The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
  • Loading branch information
Developer-Ecosystem-Engineering committed Sep 22, 2021
1 parent 7227b49 commit a3d1a00
Showing 1 changed file with 21 additions and 4 deletions.
25 changes: 21 additions & 4 deletions numpy/core/src/umath/loops_unary_fp.dispatch.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ static void simd_@TYPE@_@kind@_@STYPE@_@DTYPE@

const int vstep = npyv_nlanes_@sfx@;
const int wstep = vstep * @unroll@;

// unrolled iterations
for (; len >= wstep; len -= wstep, src += ssrc*wstep, dst += sdst*wstep) {
/**begin repeat3
* #N = 0, 1, 2, 3#
Expand All @@ -147,7 +149,23 @@ static void simd_@TYPE@_@kind@_@STYPE@_@DTYPE@
/**end repeat3**/
}

for (; len > 0; len -= vstep, src += ssrc*vstep, dst += sdst*vstep) {
// vector-sized iterations
for (; len >= vstep; len -= vstep, src += ssrc*vstep, dst += sdst*vstep) {
#if @STYPE@ == CONTIG
npyv_@sfx@ v_src0 = npyv_load_@sfx@(src);
#else
npyv_@sfx@ v_src0 = npyv_loadn_@sfx@(src, ssrc);
#endif
npyv_@sfx@ v_unary0 = npyv_@intr@_@sfx@(v_src0);
#if @DTYPE@ == CONTIG
npyv_store_@sfx@(dst, v_unary0);
#else
npyv_storen_@sfx@(dst, sdst, v_unary0);
#endif
}

// last partial iteration, if needed
if(len > 0){
#if @STYPE@ == CONTIG
#if @repl_0w1@
npyv_@sfx@ v_src0 = npyv_load_till_@sfx@(src, len, 1);
Expand All @@ -168,9 +186,7 @@ static void simd_@TYPE@_@kind@_@STYPE@_@DTYPE@
* want to do this for the last iteration / partial load-store of
* the loop since 'volatile' forces a refresh of the contents.
*/
if(len < vstep){
volatile npyv_@sfx@ unused_but_workaround_bug = v_src0;
}
volatile npyv_@sfx@ unused_but_workaround_bug = v_src0;
#endif // @RECIP_WORKAROUND@
npyv_@sfx@ v_unary0 = npyv_@intr@_@sfx@(v_src0);
#if @DTYPE@ == CONTIG
Expand All @@ -179,6 +195,7 @@ static void simd_@TYPE@_@kind@_@STYPE@_@DTYPE@
npyv_storen_till_@sfx@(dst, sdst, len, v_unary0);
#endif
}

npyv_cleanup();
}
/**end repeat2**/
Expand Down

0 comments on commit a3d1a00

Please sign in to comment.