Implement compress for NEON #369

silvanshade · 2024-01-07T00:33:05Z

This implements the compress functionality for Neon so that it has feature parity with the other SIMD backends.

Unfortunately, as suggested in the comment in the source, it seems to be no faster than the scalar implementation.

However, I figured it would at least be worth creating the PR in case someone can figure out a way to improve it.

I did try a few different techniques for implementing various parts but none of them seemed to help, or they were sometimes slower. I'm no Neon expert either so mostly this was just a naive attempt to see what the end result would be.

EDIT: I should note that I only tested this on an M3 Max MacBook Pro so it might have different performance on another platform if someone can give it a try.

Implement compress for NEON

4e81bb9

silvanshade force-pushed the feature/neon-compress branch from 6f2d724 to 4e81bb9 Compare January 7, 2024 17:47

silvanshade mentioned this pull request Jan 18, 2024

Implement RVV backend #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement compress for NEON #369

Implement compress for NEON #369

silvanshade commented Jan 7, 2024 •

edited

Implement compress for NEON #369

Are you sure you want to change the base?

Implement compress for NEON #369

Conversation

silvanshade commented Jan 7, 2024 • edited

silvanshade commented Jan 7, 2024 •

edited