Large structs constructed on stack #817

mark-i-m · 2019-06-06T04:18:33Z

I am trying to use StdRng in a no_std context where the stack is rather small. I found that the stack is overflowed when the RNG is constructed.

Basically, rand_core::SeedableRng::seed_from_u64 calls <rand::rngs::std::StdRng as rand_core::SeedableRng>::from_seed, which calls <rand_hc::hc128::Hc128Rng as rand_core::SeedableRng>::from_seed, which calls BlockRng::<Hc128Core>::from_seed ... etc

And each of these allocates a huge amount of space on the stack for the return value, which leads to a stack overflow. I imagine it is also quite slow on a normal platform too, since a lot of values will be copied on the stack.

TBH, I'm not really sure what the solution is. It seems like allocating the RNG on the heap is the right way to avoid all of this copying on the stack. On the other, perhaps marking some of the constructors #[inline(always)] would solve the problem?

The text was updated successfully, but these errors were encountered:

vks · 2019-06-06T07:16:14Z

This will hopefully get better once we release Rand 0.7, which uses ChaCha20 for StdRng, requiring much less memory than HC128 (see the Rand book). For now, I have the following suggestions:

Try using rand_chacha::ChaChaRng.
Try using Box<StdRng>. However, I'm not sure whether this avoids the stack allocation.
Consider using a RNG that is not cryptographically secure but has a tiny state, see rand::SmallRng, rand_xoshiro and rand_pcg.

burdges · 2019-06-06T07:48:27Z

We should add some #[inline(always)] lines or similar, probably in BlockRng, but..

Reusing stack space tightly sounds vital CPU cache performance. Yet, Rust code often returns large structures on the stack. We've RFCs for returning DSTs and VLAs that encourage this. We'll gain const generics this year, which should make methods like fn foo<const u: usize>(...) -> [T; n] common place.

I'd therefore consider this an ABI bug or code generation bug in rustc. I'd expect the caller to pass the called method a pointer to memory where it may construct the returned type, but even if it simply returned the value on the stack, then you'd double, or quadruple in this case, you cache efficiency by doing an overlaping copy instead of the non-overlaping copy they obviously do.

Can you create an issue on https://github.com/rust-lang/rust/ ?

dhardy · 2019-06-06T07:51:31Z

We should add some #[inline(always)] lines or similar, probably in BlockRng, but..

Agreed; I'll add to #815.

Without testing the full application, I can only guess where these will be most effective however.

We declare methods `#[inline(always)]` if they return a 200 byte Transcript because rustc does not handle large returns as efficently as one might like. See rust-random/rand#817

See rust-random#817

burdges · 2019-06-06T09:06:32Z

OT: Is there any "hot spot" analyzer for rust code? Or LLVM? You'd put tests into it and it identifies better inlining or whatever?

dhardy · 2019-06-06T09:14:56Z

Don't know. @burdges would you take a quick look at the last commit in #815? After I get a review I'll make the release (yes, it's largely guess-work at this point).

burdges · 2019-06-06T09:38:41Z

There are no BlockRngCore for .. appears in that diff, so maybe they were already marked #[inline(...)]? It's those that really need inlining I'd think, unless they're big maybe. And some inherent methods on BlockRng*. Is their already inlined then rustc ignore the directives. I've heard of that happening.

I've no idea about inlining methods from impl<R: BlockRngCore<Item=u32>> RngCore for BlockRng<R> myself.

dhardy · 2019-06-06T11:24:26Z

The BlockRngCore implementations tend to be the big ones; I inlined everything else (some with always and some without). Since all of these are pure or nearly-pure wrappers I think this is okay.

At any rate, benchmarks are either a little better or similar, so I think this is going in the right direction.

mark-i-m · 2019-06-06T14:55:29Z

Thanks all! That was a swift response!

I agree that the best solution would be some sort of language support (e.g. move elision/optimization or emplacement construction), but that doesn't seem to be a near-term thing, and I think it is already a recognized problem (e.g. there have been emplacement RFCs, and this, and others).

@vks Thanks for the suggestions

Try using rand_chacha::ChaChaRng

I will give that a try. I will also try after the release and see...

Try using Box. However, I'm not sure whether this avoids the stack allocation.

This doesn't solve the problem because the struct is created on the stack first, and then copied to the heap.

Consider using a RNG that is not cryptographically secure but has a tiny state, see rand::SmallRng, rand_xoshiro and rand_pcg

Unfortunately, I need a crypto RNG :/

vks · 2019-06-06T15:06:53Z

Unfortunately, I need a crypto RNG :/

ChaCha only needs 136 bytes, so it hopefully works for you. We could probably use the Gimli permutation as a RNG, which only has 48 bytes of state, but this is not yet implemented.

burdges · 2019-06-06T15:11:49Z

It's an interesting point that box Something::new() with Something large enough should be optimized to allocate before invoking new, but then box Something::new()? gets tricky, so maybe new(..) -> Something should be optimized to new<F>(.., f: F) -> Result<Something,..> where F: unsafe FnOnce(..) -> *mut Something.. tricky. And those DST discussion you link would basically turn fns into Generators.. very tricky.

dhardy · 2019-06-06T15:45:41Z

Good point @burdges but somewhat off-topic. #[inline] on constructors should help work around this.

I'll make those releases...

eddyb · 2019-06-10T05:47:56Z

I'd therefore consider this an ABI bug or code generation bug in rustc. I'd expect the caller to pass the called method a pointer to memory where it may construct the returned type,

This is exactly what it happens, but for soundness reasons, a copy is performed in the body of the function.
You need an optimization (which we call NRVO) to remove the copy in a sound manner.

(oops, I forgot to hit send 4 days ago...)

mark-i-m · 2019-06-10T14:09:19Z

@eddyb are there any plans to implement NRVO in rustc? would that be a breaking change? or alternately, do you think an explicit syntactic method of invoking NRVO would be better (e.g. some attribute)?

nagisa · 2019-06-10T14:16:21Z

Implementation/enablement of optimisations is, by the definition of an optimisation, not a breaking change.

eddyb · 2019-06-10T15:36:42Z

@mark-i-m Yes, see rust-lang/rust#47954 (which I first tried to get into Rust last year), it's mostly a matter of adding better debuginfo support (rust-lang/rust#56231, which I was working earlier this year before a few other things came up) before we can start doing that sort of optimizations (and IMO the need for debuginfo support would also apply if it were more manual).

And no, it wouldn't be a breaking change, as @nagisa pointed out. LLVM already can technically do it sometimes, it's just not as aggressive as we'd need it to be for Rust.

dhardy · 2019-06-12T09:48:30Z

@mark-i-m please test the new 0.7.0-pre.1 release. Hopefully the in-lining works around the issue sufficiently well.

I think then we can close this?

mark-i-m · 2019-06-15T22:19:17Z

@dhardy Thanks! Unfortunately, I'm not able to build 😕

It looks like the most recent build doesn't pass a no_std feature for ChaCha? Or am I missing a feature? I am getting the following:

$ make runtext
make -C kernel
make[1]: Entering directory '/nobackup/os2/kernel'
gcc -m64 -O0 -g -c -o start.o start.S
gcc -m64 -O0 -g -c -o mbr.o mbr.S
cargo xbuild  -Z unstable-options --out-dir `pwd`  --target x86_64-unknown-elf.json
    Updating crates.io index
   Compiling core v0.0.0 (/nobackup/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore)
   Compiling compiler_builtins v0.1.16
   Compiling rustc-std-workspace-core v1.0.0 (/nobackup/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/tools/rustc-std-workspace-core)
   Compiling alloc v0.0.0 (/tmp/xargo.0gpCipQ7XgVt)
    Finished release [optimized] target(s) in 21.57s
   Compiling semver-parser v0.7.0
   Compiling cc v1.0.35
   Compiling autocfg v0.1.2
   Compiling ppv-lite86 v0.2.5
   Compiling lazy_static v1.3.0
error[E0463]: can't find crate for `std`
 --> /nobackup/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.3.0/src/inline_lazy.rs:9:1
  |
9 | extern crate std;
  | ^^^^^^^^^^^^^^^^^ can't find crate
  |
  = note: the `x86_64-unknown-elf-1121863173107299811` target may not be installed

error: aborting due to previous error

For more information about this error, try `rustc --explain E0463`.
error: Could not compile `lazy_static`.
warning: build failed, waiting for other jobs to finish...
error: build failed
Makefile:44: recipe for target 'libkernel.a' failed
make[1]: *** [libkernel.a] Error 101
make[1]: Leaving directory '/nobackup/os2/kernel'
Makefile:8: recipe for target 'kernel' failed
make: *** [kernel] Error 2

and my Cargo.toml:

[package]
name = "kernel"
version = "0.1.0"
authors = ["mark"]

[lib]
name = "kernel"
path = "lib.rs"
crate-type = ["staticlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

[dependencies]
rlibc = "1.0.0"
spin = "0.4.8"
smallheap = { git = "https://github.com/mark-i-m/smallheap", features = ["no_std"] }
buddy = { git = "https://github.com/mark-i-m/buddy" }
x86_64 = "0.5.2"
os_bootinfo = "0.2.1"
rand = { version = "0.7.0-pre.1", default-features = false, features = ["alloc"] }

dhardy · 2019-06-16T09:11:04Z

You're right, we should depend on rand_chacha with default-features = false, and opt-in to the std when used.

@kazcw how should we handle the simd feature? We do have a simd_support feature in Rand, but this is nightly-only. We could add a new simd feature for stable SIMD support, or is it safe just to enable simd for c2-chacha in all cases?

kazcw · 2019-06-16T17:30:37Z

@dhardy The option to disable simd is sort of a historical accident. It's part of the interface now and I use it for testing, but I don't expect users to want to toggle it.

@vks Chacha can be as light as 138 bytes, but a simd implementation can do several rounds at a time before making full use of available ILP even on a sse2 machine. In the mode rand_chacha uses, c2-chacha fills a 4-round buffer so it has around 312 bytes of state.

dhardy · 2019-06-16T17:38:53Z

Unfortunately I think forwarding the std feature to rand_chacha will interfere with the fallback on Emscripten.

Do we want to use the new rand_chacha code without std? Would it be better to keep rand_hc in this case?

Also, should we drop the simd flag from rand_chacha and just always enable simd on c2-chacha @kazcw?

See rust-random#817

dhardy · 2019-09-13T15:14:49Z

@mark-i-m would you like to re-test with rand 0.7.1? I believe this should have solved the no_std issues.

mark-i-m · 2019-09-14T18:48:57Z

Hmm... I am getting the following with 0.7.1

$ make
cargo xbuild  -Z unstable-options --out-dir `pwd`  --target x86_64-os2.json
    Updating crates.io index
   Compiling ppv-lite86 v0.2.5
   Compiling rand_core v0.5.1
   Compiling c2-chacha v0.2.2
   Compiling rand_chacha v0.2.1
   Compiling rand v0.7.1
   Compiling kernel v0.1.0 (/home/mark/nobackup/os2/kernel)
warning: method is never used: `free`
   --> memory/paging/mod.rs:115:9
    |
115 |         pub fn free(&mut self, val: usize, n: usize) {
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

LLVM ERROR: Do not know how to split the result of this operator!

I've never seen this error before. Some googling suggests that this is related to the target. My target file is:

$ cat x86_64-os2.json 
{
  "arch": "x86_64",
  "cpu": "x86-64",
  "data-layout": "e-m:e-i64:64-f80:128-n8:16:32:64-S128",
  "dynamic-linking": false,
  "target-env": "gnu",
  "linker-flavor": "ld",
  "linker-is-gnu": true,
  "llvm-target": "x86_64-unknown-none",
  "os": "none",
  "position-independent-executables": true,
  "target-endian": "little",
  "target-pointer-width": "64",
  "target-c-int-width": "64",
  "no-compiler-rt": true,
  "archive-format": "gnu",
  "pre-link-args": {
      "gcc": [
          "-Wl,--nmagic",
          "-nostartfiles"
      ]
  },
  "disable-redzone": true,
  "eliminate-frame-pointer": false,
  "features": "-mmx,-sse,+soft-float"
}

EDIT: to clarify, I'm referring to the LLVM ERROR part. The unused function is on me.

dhardy · 2019-09-27T15:43:56Z

Is this issue still about large structs on the stack or is it about debugging your build? I tried reproducing but got the following, similar to your previous error.

error[E0463]: can't find crate for `std`
 --> /home/dhardy/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.4.0/src/inline_lazy.rs:9:1
  |
9 | extern crate std;
  | ^^^^^^^^^^^^^^^^^ can't find crate
  |
  = note: the `x86_64-os2-13976306759915653580` target may not be installed

However, I don't believe this has anything to do with the rand project so will now close this. I guess we can re-open if you can reproduce your original issue using the latest rand version.

mark-i-m · 2019-09-27T16:43:41Z

@dhardy yeah sorry about that. I suspect that it is probably fixed but i don't have a good way to check. Thanks for your help!

burdges · 2019-09-27T21:44:06Z

It's an open rustc issue since 2014, not fixed yet. According to rust-lang/rust#32966 you should understand and try to improve upon rust-lang/rust#47954 if you really want this sooner rather than later. Yes, this issue should be closed in favor of those.

dhardy added a commit to dhardy/rand that referenced this issue Jun 6, 2019

Adjust inlining

7c8284f

See rust-random#817

dhardy mentioned this issue Jun 6, 2019

Prepare new releases, from_rng Endianness and inlining #815

Merged

burdges mentioned this issue Jun 6, 2019

Define a Rust ABI rust-lang/rfcs#600

Closed

dhardy added a commit to dhardy/rand that referenced this issue Jul 27, 2019

Adjust inlining

eda7ac5

See rust-random#817

dhardy closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large structs constructed on stack #817

Large structs constructed on stack #817

mark-i-m commented Jun 6, 2019

vks commented Jun 6, 2019

burdges commented Jun 6, 2019 •

edited

dhardy commented Jun 6, 2019 •

edited

burdges commented Jun 6, 2019

dhardy commented Jun 6, 2019

burdges commented Jun 6, 2019

dhardy commented Jun 6, 2019

mark-i-m commented Jun 6, 2019

vks commented Jun 6, 2019

burdges commented Jun 6, 2019 •

edited

dhardy commented Jun 6, 2019

eddyb commented Jun 10, 2019

mark-i-m commented Jun 10, 2019

nagisa commented Jun 10, 2019

eddyb commented Jun 10, 2019 •

edited

dhardy commented Jun 12, 2019

mark-i-m commented Jun 15, 2019

dhardy commented Jun 16, 2019

kazcw commented Jun 16, 2019

dhardy commented Jun 16, 2019

dhardy commented Sep 13, 2019

mark-i-m commented Sep 14, 2019 •

edited

dhardy commented Sep 27, 2019

mark-i-m commented Sep 27, 2019

burdges commented Sep 27, 2019

Large structs constructed on stack #817

Large structs constructed on stack #817

Comments

mark-i-m commented Jun 6, 2019

vks commented Jun 6, 2019

burdges commented Jun 6, 2019 • edited

dhardy commented Jun 6, 2019 • edited

burdges commented Jun 6, 2019

dhardy commented Jun 6, 2019

burdges commented Jun 6, 2019

dhardy commented Jun 6, 2019

mark-i-m commented Jun 6, 2019

vks commented Jun 6, 2019

burdges commented Jun 6, 2019 • edited

dhardy commented Jun 6, 2019

eddyb commented Jun 10, 2019

mark-i-m commented Jun 10, 2019

nagisa commented Jun 10, 2019

eddyb commented Jun 10, 2019 • edited

dhardy commented Jun 12, 2019

mark-i-m commented Jun 15, 2019

dhardy commented Jun 16, 2019

kazcw commented Jun 16, 2019

dhardy commented Jun 16, 2019

dhardy commented Sep 13, 2019

mark-i-m commented Sep 14, 2019 • edited

dhardy commented Sep 27, 2019

mark-i-m commented Sep 27, 2019

burdges commented Sep 27, 2019

burdges commented Jun 6, 2019 •

edited

dhardy commented Jun 6, 2019 •

edited

burdges commented Jun 6, 2019 •

edited

eddyb commented Jun 10, 2019 •

edited

mark-i-m commented Sep 14, 2019 •

edited