Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block RNGs: remove unaligned memory cast #783

Merged
merged 2 commits into from Apr 23, 2019
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Apr 22, 2019

Fix #779. Review please @RalfJung.

The point of the removed specialisations was to avoid one copy. Since the output type may not have the correct alignment and we wish to copy bytes in the same order, we have no choice but to use a buffer anyway. (We could complicate things by checking the alignment at run-time, but I don't think it's worthwhile given the very small performance cost of the extra copy.)

Benchmarking with 10*1024 byte buffer does show a small impact:

# before
test gen_bytes_chacha20             ... bench:  17,775,692 ns/iter (+/- 290,563) = 576 MB/s
test gen_bytes_hc128                ... bench:   4,243,890 ns/iter (+/- 69,032) = 2412 MB/s
test gen_bytes_isaac                ... bench:   7,238,757 ns/iter (+/- 123,797) = 1414 MB/s
test gen_bytes_isaac64              ... bench:   3,900,653 ns/iter (+/- 47,162) = 2625 MB/s
# after
test gen_bytes_chacha20             ... bench:  18,007,036 ns/iter (+/- 384,358) = 568 MB/s
test gen_bytes_hc128                ... bench:   4,634,505 ns/iter (+/- 55,119) = 2209 MB/s
test gen_bytes_isaac                ... bench:   7,338,592 ns/iter (+/- 90,396) = 1395 MB/s
test gen_bytes_isaac64              ... bench:   4,066,367 ns/iter (+/- 55,543) = 2518 MB/s

The second commit cleans up some warnings in the benches.

These specialisations relied on casting a u8 byte slice to
a u32 or u64 slice, which is UB due to alignment requirements.
@burdges
Copy link
Contributor

burdges commented Apr 22, 2019

It's not worth it to use read_unaligned like you mentioned previously?

@dhardy
Copy link
Member Author

dhardy commented Apr 22, 2019

This isn't about unaligned reads, it's about unaligned writes masquerading as aligned ones (because of cast to &[u32] or &[u64]). I guess we could change BlockRngCore::generate to take *mut u32 arg or some such, but I really don't think these small performance losses are worth pushing unsafe code into all block-rng implementations.

@RalfJung
Copy link
Contributor

Is there a way that align_to could be useful here to find the "aligned" part of the buffer? That would work on all platforms then, not just x86.

The change itself seems fine; this is what Miri is testing in #781.

@dhardy
Copy link
Member Author

dhardy commented Apr 22, 2019

Finding the aligned subset of dest doesn't help unless we allow bytes of the RNG to be skipped. We don't wish to do this arbitrarily (or based on something as tenuous as input alignment) because in some cases reproducibility is important, and this would be hard to document and a breaking change from past behaviour.

I'll leave this until tomorrow for any further review, then we can merge the PRs.

@vks
Copy link
Collaborator

vks commented Apr 23, 2019

Finding the aligned subset of dest doesn't help unless we allow bytes of the RNG to be skipped.

Maybe we should add an API that requires aligned slices?

@dhardy
Copy link
Member Author

dhardy commented Apr 23, 2019

As in fill_aligned_bytes? Possible I guess.

What I don't understand is why everyone keeps suggesting complicated ways to keep this (very) small highly-specific optimisation.

@dhardy dhardy merged commit 40b8eb9 into rust-random:master Apr 23, 2019
@burdges
Copy link
Contributor

burdges commented Apr 23, 2019

It's not worth complicating the trait over this. lol

@dhardy dhardy deleted the block branch May 16, 2019 11:27
@LukasKalbertodt
Copy link

Would it be possible to backport this bugfix to rand_core 0.4? Currently it's not possible to run quickcheck tests through Miri because this bug always makes the tests fail. And I think quite a few crates have not switched to the new rand version yet.

But I can also understand if this is too much work. I just wanted to ask ^_^

@dhardy
Copy link
Member Author

dhardy commented Jul 27, 2019

Sure, I guess it's possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rand performs unaligned memory access (and invalid raw ptr usage)
5 participants