Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use runtime CPU feature detection to select which SIMD instruction set to use #125

Open
james7132 opened this issue Mar 23, 2024 · 0 comments

Comments

@james7132
Copy link
Collaborator

There are options like core::arch::is_x86_feature_detected which can detect which instruction sets are available. Unfortunately the checks cannot be done inside each function call due to the cost of feature detection.

One potential way around this is to do feature detection during initialization, and use a tagged pointer to store the features detected. As any SIMD-supporting platform is at least 32-bit wide, there are at least two bits at the bottom of every pointer to a backing allocation that are always zero. If the default block size is increased to 8 to 64 bytes, the number of tag bits increases. An example mapping for x86 may include:

  • 00 - Default, none detected.
  • 01 - SSE2 detected
  • 10 - SSE4.1 detected
  • 11 - AVX detected

These bits can then be zeroed out on access in a branchless way, which should have a slight impact negative performance impact to point queries (contains, insert, etc.), but allow for the most performant instructions to be used without explicitly compiling for a particular target feature set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant