Drop FMA intrinsic #445

jserv · 2021-06-04T19:31:50Z

Danila Kutenin pointed out:

Technically speaking, _mm_fmadd_ps is not an SSE extension, this was
introduced with fma extension which took place even after AVX.

To clarify the purpose of SSE2NEON, this pach would drop the existing
FMA implementation.

Related: #82

marktwtn · 2021-06-04T23:17:38Z

sse2neon.h

+#if defined(__aarch64__)
+    return vreinterpretq_m128_f32(vfmaq_f32(vreinterpretq_f32_m128(a),
+                                            vreinterpretq_f32_m128(mask),
+                                            vreinterpretq_f32_m128(b)));
+#else
+    return _mm_add_ps(_mm_mul_ps(b, mask), a);
+#endif


Remove the ARM 32-bit implementation since vfmaq_f32 is also supported in A32.

vfmaq_f32 is only available for VFPv4+. Thus, for Armv7-A targets, we have to take the following cases into consideration:

VFPv3, which is implemented on Cortex-R4 and R5 processors and the Tegra 2 (Cortex-A9).

VFPv4, which is implemented on the A15 and Cortex-A7, or later

Reference: https://embeddedartistry.com/blog/2017/10/11/demystifying-arm-floating-point-compiler-options/

Danila Kutenin pointed out: > Technically speaking, _mm_fmadd_ps is not an SSE extension, this was > introduced with fma extension which took place even after AVX. To clarify the purpose of SSE2NEON, this pach would drop the existing FMA implementation. The instruction vfmaq_f32, standing for "fused floating-point multiply-accumulate", is only available for VFPv4+. Thus, for Armv7-A targets, we have to take the following cases into consideration: * VFPv3, which is implemented on Cortex-R4, R5, Cortex-A9 * VFPv4, which is implemented on the A15 and Cortex-A7, or later According to the ACLE spec[1], "__ARM_FEATURE_FMA" is defined to 1 if the hardware floating-point architecture supports fused floating-point multiply-accumulate. Related: #82 [1] https://developer.arm.com/architectures/system-architectures/software-standards/acle

jserv requested a review from marktwtn as a code owner June 4, 2021 19:31

howjmay mentioned this pull request Jun 4, 2021

feat: Implement FMA function _mm_fmadd_pd #245

Closed

marktwtn requested changes Jun 4, 2021

View reviewed changes

jserv force-pushed the drop-fma-intrinsic branch from c7af009 to ab1ceea Compare June 5, 2021 02:34

jserv requested a review from marktwtn June 5, 2021 02:38

marktwtn approved these changes Jun 5, 2021

View reviewed changes

jserv merged commit 3f36fa6 into master Jun 5, 2021

marktwtn deleted the drop-fma-intrinsic branch June 5, 2021 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop FMA intrinsic #445

Drop FMA intrinsic #445

jserv commented Jun 4, 2021

marktwtn Jun 4, 2021

jserv Jun 5, 2021

Drop FMA intrinsic #445

Drop FMA intrinsic #445

Conversation

jserv commented Jun 4, 2021

marktwtn Jun 4, 2021

Choose a reason for hiding this comment

jserv Jun 5, 2021

Choose a reason for hiding this comment