Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple of build issues on 32bit MSVC v143 build with /arch:SSE #1111

Open
Epixu opened this issue Nov 21, 2023 · 17 comments
Open

Couple of build issues on 32bit MSVC v143 build with /arch:SSE #1111

Epixu opened this issue Nov 21, 2023 · 17 comments

Comments

@Epixu
Copy link
Contributor

Epixu commented Nov 21, 2023

Hi again, I'm experiencing build errors with the current master (e651ec3)
Many issues have been fixed regarding my prior issue, including many of the failing tests involving precision discrepancies.
Here's the summary:

All my issues apply to MSVC v143 x86 builds - I have 5 such builds, each with a different /arch:XXX option:

1. The two build issues apply only to MSVC v143 x86 with /arch:SSE enabled

a) The first build issue involves reaching line 4774 here:

simde/simde/x86/sse2.h

Lines 4770 to 4774 in e651ec3

simde_mm_pause (void) {
#if defined(SIMDE_X86_SSE2_NATIVE)
_mm_pause();
#elif defined(SIMDE_ARCH_X86)
__asm__ __volatile__("pause");

The line involves GCC syntax, it should be guarded. I can fix this build issue by doing:

simde_mm_pause (void) {
  #if defined(SIMDE_X86_SSE2_NATIVE)
    _mm_pause();
  #elif defined(SIMDE_ARCH_X86)
    #if defined(_MSC_VER)
      _mm_pause();
    #else
      __asm__ __volatile__("pause");
    #endif
  #elif defined(SIMDE_ARCH_ARM_NEON)

But not sure if _mm_pause() won't cause invalid instruction at runtime, if the SIMDE_X86_SSE2_NATIVE isn't available. Is it available on SIMDE_X86_SSE_NATIVE?

b) The second build issue involves SVML

I can circumvent it by simply guard the include of the svml.h - not sure if that is intended though. There should be fallback alternatives in SIMDe? These errors seemingly involve only 'simde__m128d' to '__m128d' conversion issues.

2. The test issues apply to all /arch options on MSVC v143 x86 builds:

They seemingly affect only pow functions precision, but my tests aren't exhaustive, so not sure.

@Epixu
Copy link
Contributor Author

Epixu commented Nov 21, 2023

I was able to fix error 1.b by changing this check, which reenables native SVML for MSVC, while ignoring potential /arch:SSE:

#if !defined(SIMDE_X86_SVML_NATIVE) && !defined(SIMDE_X86_SVML_NO_NATIVE) && !defined(SIMDE_NO_NATIVE)
#if defined(SIMDE_ARCH_X86) && (defined(__INTEL_COMPILER) || (HEDLEY_MSVC_VERSION_CHECK(14, 20, 0) && !defined(__clang__)))
#define SIMDE_X86_SVML_NATIVE
#endif
#endif

to

#if !defined(SIMDE_X86_SVML_NATIVE) && !defined(SIMDE_X86_SVML_NO_NATIVE) && !defined(SIMDE_NO_NATIVE)
  #if defined(SIMDE_ARCH_X86) && (defined(__INTEL_COMPILER) || (HEDLEY_MSVC_VERSION_CHECK(14, 20, 0) && !defined(__clang__) && defined(SIMDE_X86_SSE2_NATIVE)))
    #define SIMDE_X86_SVML_NATIVE
  #endif
#endif

Only the __m128d (because it comes with SSE2?) is troublesome though. It probably should provide the single precision SVML functions either way, so I suppose this is not the optimal solution

@mr-c
Copy link
Collaborator

mr-c commented Mar 26, 2024

Hello @Epixu ; do you still have issues with 32bit MSVC builds using the latest SIMDe: v0.8.0?

@Epixu
Copy link
Contributor Author

Epixu commented Mar 26, 2024

Hi, I will bump the version and check it in a couple of days, thanks for the notification!

@Epixu
Copy link
Contributor Author

Epixu commented Mar 31, 2024

No change for this issue with the current master (517da84)
Both the build and test issues remain the same

@mr-c
Copy link
Collaborator

mr-c commented Apr 1, 2024

But not sure if _mm_pause() won't cause invalid instruction at runtime, if the SIMDE_X86_SSE2_NATIVE isn't available. Is it available on SIMDE_X86_SSE_NATIVE?

According to Intel, _mm_pause is SSE2 only: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_paus&ig_expand=4897
and Microsoft agrees: https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170

So a guard can be added to not use the GCC asm syntax on MSVC

@mr-c
Copy link
Collaborator

mr-c commented Apr 1, 2024

@Epixu can you test __asm pause; as an MSVC-friendly alternative to __asm__ __volatile__("pause");?

@mr-c
Copy link
Collaborator

mr-c commented Apr 1, 2024

With 2d498c8

I'm seeing the following SIMDe x86 test errors on 32-bit MSVC:

 473/2002 x86/sse/native/c                        TIMEOUT        30.24s   exit status 1
../test/x86/sse.c:2094: assertion failed: r[0] ~= test_vec[i].r[0] (-836.000000 ~= -836.000000)
../test/x86/sse.c:2167: assertion failed: r[0] ~= test_vec[i].r[0] (-172.000000 ~= -172.000000)
../test/x86/sse.c:2246: assertion failed: r[0] ~= test_vec[i].r[0] (25075.000000 ~= 25075.000000)
../test/x86/sse.c:2287: assertion failed: r[0] ~= test_vec[i].r[0] (-200.000000 ~= -200.000000)
../test/x86/sse.c:2333: assertion failed: r[0] ~= e[0] (46384.000000 ~= 46384.000000)
../test/x86/sse.c:2375: assertion failed: r[0] ~= test_vec[i].r[0] (113.000000 ~= 113.000000)
../test/x86/sse.c:2522: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (50326.000000 ~= 50326.000000)
../test/x86/sse.c:2555: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (101.000000 ~= 101.000000)
../test/x86/sse.c:2899: assertion failed: r[0] ~= test_vec[i].r[0] (-3.607422 ~= -3.610000)
../test/x86/sse.c:4429: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-609.119995 ~= -609.119995)

Which is pretty weird, the numbers look exactly the same to me except for line 2899; something about the float comparator isn't working?

SSE2, no timeout, so we get a list of the failing function tests:

|  475/2002 sse2/mm_setr_pd                       FAIL          
|  475/2002 sse2/x_mm_abs_pd                      FAIL          
|  475/2002 sse2/mm_store_pd                      FAIL          
|  475/2002 sse2/mm_min_pd                        FAIL          
|  475/2002 sse2/mm_cmpord_pd                     FAIL          
|  475/2002 sse2/mm_cvtpd_ps                      FAIL          
|  475/2002 sse2/mm_cvtpi32_pd                    FAIL          
|  475/2002 sse2/mm_div_pd                        FAIL          
|  475/2002 sse2/mm_or_pd                         FAIL   
../test/x86/sse2.c:7132: assertion failed: r[0] ~= test_vec[i].r[0] (0.740000 ~= 0.740000)
../test/x86/sse2.c:59: assertion failed: r[0] ~= simde_mm_loadu_pd(test_vec[i].r)[0] (147.280000 ~= 147.280000)
../test/x86/sse2.c:8244: assertion failed: r[0] ~= test_vec[i].r[0] (176.750000 ~= 176.750000)
../test/x86/sse2.c:5020: assertion failed: r[0] ~= test_vec[i].r[0] (-514.320000 ~= -514.320000)
../test/x86/sse2.c:2558: assertion failed: r[1] ~= simde_mm_loadu_pd(test_vec[i].r)[1] (0.000000 ~= 0.000000)
../test/x86/sse2.c:3362: assertion failed: r[0] ~= test_vec[i].r[0] (689.409973 ~= 689.409973)
../test/x86/sse2.c:3394: assertion failed: r[0] ~= test_vec[i].r[0] (-579.000000 ~= -579.000000)
../test/x86/sse2.c:4152: assertion failed: r[0] ~= test_vec[i].r[0] (1.158700 ~= 1.160000)
../test/x86/sse2.c:5919: assertion failed: r[0] ~= test_vec[i].r[0] (128062.238280 ~= 128062.240000)

Again, all visually identical, except for one (line 4152); yet still assertion failed

@Epixu
Copy link
Contributor Author

Epixu commented Apr 1, 2024

Does ~= stand for some approximate comparison based on ULPs? Could these look the same due to bad print? Maybe if printed in scientific notation would be better?

I will try __asm pause in a couple of days

@mr-c
Copy link
Collaborator

mr-c commented Apr 1, 2024

Does ~= stand for some approximate comparison based on ULPs? Could these look the same due to bad print? Maybe if printed in scientific notation would be better?

Could be, I didn't write that code and it feels well outside my area of expertise:

simde/test/test.h

Lines 729 to 748 in 8639fef

static int
simde_test_equal_f32(simde_float32 a, simde_float32 b, simde_float32 slop) {
if (simde_math_isnan(a)) {
return simde_math_isnan(b);
} else if (simde_math_isinf(a)) {
return !((a < b) || (a > b));
} else if (slop == SIMDE_FLOAT32_C(0.0)) {
return !simde_memcmp(&a, &b, sizeof(simde_float32));
} else {
simde_float32 lo = a - slop;
if (HEDLEY_UNLIKELY(lo == a))
lo = simde_math_nextafterf(a, -SIMDE_MATH_INFINITYF);
simde_float32 hi = a + slop;
if (HEDLEY_UNLIKELY(hi == a))
hi = simde_math_nextafterf(a, SIMDE_MATH_INFINITYF);
return ((b >= lo) && (b <= hi));
}
}

simde/test/test.h

Lines 866 to 867 in 8639fef

simde_test_debug_printf_("%s:%d: assertion failed: %s ~= %s (%f ~= %f)\n",
filename, line, astr, bstr, HEDLEY_STATIC_CAST(double, a), HEDLEY_STATIC_CAST(double, b));

I will try __asm pause in a couple of days

Thanks!

@Epixu
Copy link
Contributor Author

Epixu commented Apr 1, 2024

The single precision comparison function is not ULPs based, it uses a precision range, which if zero - does a perfect compare. The failing functions however seem to involve double precision, so some points of failure come to mind: is the proper double-precision compare used, and is precision specified properly to not be zero, because internal CPU float representation is always assumed to be a black box (while residing in the registers, they're usually with additional bits of precision, which get truncated later) - can be anything on different CPU architectures.

Also, it would probably be better to use (%e ~= %e) to see if something changes at the back of the float

@mr-c
Copy link
Collaborator

mr-c commented Apr 2, 2024

Thanks for the explanation @Epixu , here are the same test failures using hexadecimal exponent notation. All affected tests do the float comparison using a precision of 1 == a slop of 10^-1 (which is rather generous, if you ask me).

While all of these affected tests use simde_test_x86_assert_equal_f32x{2,4} or simde_assert_equal_vf32 there are other tests that use those comparators which don't fail.

../test/x86/sse.c:2094: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.a200000000000p+9 ~= -0x1.a200000000000p+9)
../test/x86/sse.c:2167: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.5800000000000p+7 ~= -0x1.5800000000000p+7)
../test/x86/sse.c:2246: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.87cc000000000p+14 ~= 0x1.87cc000000000p+14)
../test/x86/sse.c:2287: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.9000000000000p+7 ~= -0x1.9000000000000p+7)
../test/x86/sse.c:2333: assertion failed: r[0] ~= e[0] (0x1.6a60000000000p+15 ~= 0x1.6a60000000000p+15)
../test/x86/sse.c:2375: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.c400000000000p+6 ~= 0x1.c400000000000p+6)
../test/x86/sse.c:2522: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (0x1.892c000000000p+15 ~= 0x1.892c000000000p+15)
../test/x86/sse.c:2555: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (0x1.9400000000000p+6 ~= 0x1.9400000000000p+6)
../test/x86/sse.c:2899: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.cdc0020000000p+1 ~= -0x1.ce147a0000000p+1)
../test/x86/sse.c:4429: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-0x1.308f5c0000000p+9 ~= -0x1.308f5c0000000p+9)
../test/x86/sse.c:4982: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.b80f5c0000000p+7 ~= 0x1.b80f5c0000000p+7)
../test/x86/sse.c:5015: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.8c00000000000p+9 ~= 0x1.8c00000000000p+9)
../test/x86/sse.c:5048: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.62fc280000000p+9 ~= 0x1.62fc280000000p+9)

https://ci.appveyor.com/project/nemequ/simde/build/job/uhkyqxm6fogyagv1?fullLog=true

@Epixu
Copy link
Contributor Author

Epixu commented Apr 2, 2024

This is super weird. Is it possible that the error somehow happens at the compare, but is fine by the time data is printed?

Also, just to note, this wouldn't be the first time I've encountered nasty generated assembly bugs from MSVC. To give an example, I've even documented one involving AVX here: https://github.com/Epixu/msvc_but_repro

I guess our last beacon of hope is to compare some healthy ASM with what the compiler generates...

@mr-c
Copy link
Collaborator

mr-c commented Apr 2, 2024

I guess our last beacon of hope is to compare some healthy ASM with what the compiler generates...

Agreed, but that it outside of my abilities and time availability :-(

@Epixu
Copy link
Contributor Author

Epixu commented Apr 2, 2024

I will investigate more when I too get the time, Still, thanks a lot for narrowing it down

@mr-c
Copy link
Collaborator

mr-c commented Apr 2, 2024

This is super weird. Is it possible that the error somehow happens at the compare, but is fine by the time data is printed?

In fc52a85 I tried crafting the test data for a single test differently, but I got the same error: ./test/x86/sse.c:2095: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-0x1.a200000000000p+9 ~= -0x1.a200000000000p+9)

Interestingly, the test SSE failures only are with the "native" code paths, but not the emulated one. The SSE2 failing tests are for both (this is for a build without /arch:SSE or similar)

https://ci.appveyor.com/project/nemequ/simde/build/job/u3cn62vsdvvua23o?fullLog=true

I will investigate more when I too get the time

Thanks!

Still, thanks a lot for narrowing it down

You are welcome! I think there are several fixes from this branch worth merging already, so I'm glad for that.

@mr-c
Copy link
Collaborator

mr-c commented Apr 2, 2024

One more question for you: are these issues a regression from a previous SIMDe release / commit? Or have they always been present from what you have seen?

@Epixu
Copy link
Contributor Author

Epixu commented Apr 2, 2024

Honestly, I have no idea. It seems that the issues emerged in later version after introduction of more features, and the only reason the older version still works with me, is that the features are simply not available. So probably a regression. I can't really be sure, though.

I still stick to one particular commit (5e7c4d4), because it passes all my tests and lets me continue work on more important stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants