Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM architecture support #43

Open
ghost opened this issue Oct 18, 2018 · 4 comments
Open

ARM architecture support #43

ghost opened this issue Oct 18, 2018 · 4 comments

Comments

@ghost
Copy link

ghost commented Oct 18, 2018

You can try https://github.com/nemequ/simde for easy transfer of SSE / AVX instructions to ARM.

@bact
Copy link

bact commented Feb 18, 2020

Will Arm Neon helps?

Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD)
architecture extension for the Arm Cortex-A and Cortex-R series processors.

https://developer.arm.com/architectures/instruction-sets/simd-isas/neon

@qhaas
Copy link

qhaas commented Aug 5, 2020

I'm not familiar with that library, but if it does what is claims well, might also allow for Power ISA AltiVec/VMX support for pillow-simd for those of us on ppc64le systems.

@AWSjswinney
Copy link

I managed to use SSE2Neon to get a build working on aarch64. I'm planning to open a pull request soon. Would you be open to include such changes?

@homm
Copy link

homm commented Jun 5, 2021

@AWSjswinney

This SIMD code is heavily optimized for SSE and AVX instructions. Of course you can translate SSE instructions to NEON and you will get "NEON" version. But will it be even close to speeding up the original SSE version?

For example, one of the most frequently used instruction is _mm_madd_epi16, it makes 8 multiplications and four additions at once. If you take a look at it's implementation in SSE2NEON, you'll see four vget_low_s16, four vget_high_s16, two vmull_s16, two vpadd_s32 and one vcombine_s32 instructions, 13 instructions in total.

I bet what you try to achieve is not some "NEON" version, but optimized NEON version. And I believe this is not the right way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants