Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The int128 implementation of simde_mm_testz_si128() is incorrect #1119

Open
markos opened this issue Dec 20, 2023 · 1 comment
Open

The int128 implementation of simde_mm_testz_si128() is incorrect #1119

markos opened this issue Dec 20, 2023 · 1 comment

Comments

@markos
Copy link

markos commented Dec 20, 2023

The int128 implementation of simde_mm_testz_si128() is incorrect, caught in vectoscan CI with SIMDe backend:

https://buildbot-ci.vectorcamp.gr/#/builders/119/builds/14

After investigation I found that for x86 the int128 implementation was used and only these tests were failing. The other architectures used the u64 method and they had correct results.

if ((a_.u128[0] & b_.u128[0]) == 0) {

The following fix seems to work and it makes sense as it seems to be consistent with the u64 implementation below.

diff --git a/simde/x86/sse4.1.h b/simde/x86/sse4.1.h
index 15a197b9..ad583bd1 100644
--- a/simde/x86/sse4.1.h
+++ b/simde/x86/sse4.1.h
@@ -2341,10 +2341,10 @@ simde_mm_testz_si128 (simde__m128i a, simde__m128i b) {
       v128_t m = wasm_v128_and(a_.wasm_v128, b_.wasm_v128);
       return (wasm_i64x2_extract_lane(m, 0) | wasm_i64x2_extract_lane(m, 1)) == 0;
     #elif defined(SIMDE_HAVE_INT128_)
-      if ((a_.u128[0] & b_.u128[0]) == 0) {
-        return 1;
+      if ((a_.u128[0] & b_.u128[0]) > 0) {
+        return 0;
       }
-      return 0;
+      return 1;
     #else
       for (size_t i = 0 ; i < (sizeof(a_.u64) / sizeof(a_.u64[0])) ; i++) {
         if ((a_.u64[i] & b_.u64[i]) > 0)
markos added a commit to VectorCamp/vectorscan that referenced this issue Dec 21, 2023
SIMDe on Clang needs SIMDE_NO_CHECK_IMMEDIATE_CONSTANT defined and other SIMDe related fixes now that SIMDe is part of the CI pipeline.

Some issue with SIMDe on x86 still remains because of an upstream bug:

simd-everywhere/simde#1119

Similarly SIMDe native with clang on Arm also poses a non-high priority build failure:

https://buildbot-ci.vectorcamp.gr/#/builders/129/builds/11

Possibly a SIMDe issue as well, need to investigate but will merge this PR as these are non-blockers.
@mr-c
Copy link
Collaborator

mr-c commented Dec 21, 2023

To document our conversation from another place:

The int128 implementation was introduced in
f132275#diff-c34749104c5a84a98b2e0af443fc6df77276e235abd1cff17a74d285c4fddd93R2344

From what I see, it still matches the guidance from Intel for _mm_testz_si128: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_testz_si128&ig_expand=6857

Compute the bitwise AND of 128 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. [Something about CF, but that is not used in this function] Return the ZF value.

The other pathway has reversed structure to enable fast failing, as it compares 64 bits at a time; so if the bitwise AND of the first 64 bits is not zero, we return 0 immediately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants