Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop FMA intrinsic #445

Merged
merged 1 commit into from
Jun 5, 2021
Merged

Drop FMA intrinsic #445

merged 1 commit into from
Jun 5, 2021

Conversation

jserv
Copy link
Member

@jserv jserv commented Jun 4, 2021

Danila Kutenin pointed out:

Technically speaking, _mm_fmadd_ps is not an SSE extension, this was
introduced with fma extension which took place even after AVX.

To clarify the purpose of SSE2NEON, this pach would drop the existing
FMA implementation.

Related: #82

sse2neon.h Outdated
Comment on lines 6026 to 6041
#if defined(__aarch64__)
return vreinterpretq_m128_f32(vfmaq_f32(vreinterpretq_f32_m128(a),
vreinterpretq_f32_m128(mask),
vreinterpretq_f32_m128(b)));
#else
return _mm_add_ps(_mm_mul_ps(b, mask), a);
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the ARM 32-bit implementation since vfmaq_f32 is also supported in A32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vfmaq_f32 is only available for VFPv4+. Thus, for Armv7-A targets, we have to take the following cases into consideration:

  • VFPv3, which is implemented on Cortex-R4 and R5 processors and the Tegra 2 (Cortex-A9).
  • VFPv4, which is implemented on the A15 and Cortex-A7, or later

Reference: https://embeddedartistry.com/blog/2017/10/11/demystifying-arm-floating-point-compiler-options/

Danila Kutenin pointed out:
> Technically speaking, _mm_fmadd_ps is not an SSE extension, this was
> introduced with fma extension which took place even after AVX.

To clarify the purpose of SSE2NEON, this pach would drop the existing
FMA implementation.

The instruction vfmaq_f32, standing for "fused floating-point
multiply-accumulate", is only available for VFPv4+. Thus, for Armv7-A
targets, we have to take the following cases into consideration:
* VFPv3, which is implemented on Cortex-R4, R5, Cortex-A9
* VFPv4, which is implemented on the A15 and Cortex-A7, or later

According to the ACLE spec[1], "__ARM_FEATURE_FMA" is defined to 1 if
the hardware floating-point architecture supports fused floating-point
multiply-accumulate.

Related: #82

[1] https://developer.arm.com/architectures/system-architectures/software-standards/acle
@jserv jserv merged commit 3f36fa6 into master Jun 5, 2021
@marktwtn marktwtn deleted the drop-fma-intrinsic branch June 5, 2021 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants