Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "simd_support" feature #1056

Merged
merged 3 commits into from Oct 15, 2020
Merged

Fix "simd_support" feature #1056

merged 3 commits into from Oct 15, 2020

Conversation

vks
Copy link
Collaborator

@vks vks commented Oct 8, 2020

No description provided.

This is necessary, because support for `__m64` was removed from nighly
Rust [1].

Fixes rust-random#1050.

[1] rust-lang/stdarch#823
Copy link
Member

@josephlr josephlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

src/distributions/utils.rs Outdated Show resolved Hide resolved
The specialized instructions are already available with `sse2`.
#[cfg(not(target_feature = "sse4.2"))]
#[cfg(not(target_feature = "sse2"))]
wmul_impl! { (u16x8, u32x8),, 16 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is a 256-bit op and thus requires AVX?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think packed_simd provides a fallback implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it?

Can you at least clarify on what basis you decided to make this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without 128-bit XMM multiplications, packed_simd falls back to a scalar implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why do we bother with these cfg selectors at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is to provide a specialized implementation that uses less instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying the use of a SSE4.2 target originally was a mistake?

Maybe it was, since the insrtuctions appear to be supported on SSE2: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mullo_epi16&expand=3967,3990

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appears to be a mistake. I guess that was my bad

@vks
Copy link
Collaborator Author

vks commented Oct 15, 2020

Assuming @dhardy is satisfied, I'm going ahead and merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants