Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: add rng_seed option to runtime::Builder #4910

Merged
merged 33 commits into from Sep 16, 2022
Merged

Conversation

hds
Copy link
Contributor

@hds hds commented Aug 12, 2022

Motivation

In certain circumstances (e.g. running tests with loom), it may be desirable
to have deterministic behavior when running tokio.

Partly this may be achieved by seeding the random number generator used
by the tokio::select! macro.

This PR is more a proof of concept that a real proposed change, it needs work
before it could be merged.

Refs: #4879

Open Questions

There are a number of open questions (or rather things that I need help with):

  • Currently we seed all threads with the same value. This is important because
    when using a multi-threaded runtime, the thread that a task will be scheduled
    on is not deterministic. Is this acceptable?
  • Answer: Threads are now each seeded with a value generated by a seed generator.
    This makes the values deterministic, but not the same.
  • Is there a way to make the thread that a task is run on deterministic?
  • Answer: This is outside the scope of this change.
  • The API provided takes a u64 as a seed. This is not very flexible, but avoids
    the need to hash or otherwise reduce some other value to a u64 to seed the
    actual RNG. This was done deliberately, because it's the easiest thing to change
    in this PR before merging. Opinions?
  • Answer: Changed the value to an opaque RngSeed type which can be constructed
    from a byte slice.
  • Starting multiple runtimes from a single thread will overwrite the RNG seed. This
    feels ugly to me, but I don't see a better way without sacrificing performance by,
    e.g. making the RNG global. Suggestions?
  • Answer: The Handle::enter pattern is used to set the seed when a runtime is entered
    from a thread, when the thread leaves the runtime, the previous thread local value is
    restored.

Solution

The tokio::select! macro polls branches in a random order. While this
is desirable in production, for testing purposes a more deterministic
approach can be useul.

This change adds an additional parameter to the runtime Builder to set
the random number generator seed. This value is then used to reset the
seed on the current thread when the runtime is entered into (restoring the
previous value when the thread leaves the runtime). All threads created
explicitly by the runtime also have a seed set as the runtime is built. Each
thread is set with a seed from a deterministic sequence.

This guarantees that calls to the tokio::select! macro which are
performed in the same order on the same thread will poll branches in the
same order.

Both the builder parameter as well as the RngSeed struct are marked
unstable initially.

The `tokio::select!` macro polls branches in a random order. While this
is desirable in production, for testing purposes a more deterministic
approach can be useul.

This change adds an additional parameter to the runtime `Builder` to set
the random number generator seed. This value is then used to reset the
seed on all threads associated with the runtime being built.

This guarantees that calls to the `tokio::select!` macro which are
performed in the same order on the same thread will poll branches in the
same order.
@github-actions github-actions bot added the R-loom Run loom tests on this PR label Aug 12, 2022
@hds hds requested a review from carllerche August 12, 2022 22:32
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Aug 14, 2022
@carllerche
Copy link
Member

Thanks for taking this on! I will look shortly, but first I will answer some of the questions.

Currently we seed all threads with the same value. This is important because
when using a multi-threaded runtime, the thread that a task will be scheduled
on is not deterministic. Is this acceptable?

We probably don't want to do this as it will make the order of rands identical on each thread. What we can do is use the seed for a "seed random generator" and use that to generate a per-thread seed.

Pseudo code:

let seed_rng = Rng::new(seed_from_builder);

for _ in thread_to_spawn.iter() {
    let thread_seed = seed.rng.next_seed();
   spawn_thread_with_seed(thread_seed);
}
```

Hopefully, this makes some sense.

> Is there a way to make the thread that a task is run on deterministic?

Not entirely, but this is out of scope. For the use cases in question, they control enough to make sure it is deterministic. In theory, if you use the current_thread scheduler and strickly control I/O and other input, you can make it deterministic. Even better is to mock out I/O.

@carllerche
Copy link
Member

The API provided takes a u64 as a seed. This is not very flexible, but avoids
the need to hash or otherwise reduce some other value to a u64 to seed the
actual RNG. This was done deliberately, because it's the easiest thing to change
in this PR before merging. Opinions?

Hmm, I don't think we want to expose u64 there as each rng algorithm has a different seed format. I am not sure what we do want to expose yet, but what we can do is start by flagging this setting as tokio_unstable and punt :). My guess is we will want to define an opaque type RngSeed or something and conversions to it.

@carllerche
Copy link
Member

Starting multiple runtimes from a single thread will overwrite the RNG seed. This
feels ugly to me, but I don't see a better way without sacrificing performance by,
e.g. making the RNG global. Suggestions?

Can you use the Handle::enter pattern here? When one "enters" a runtime, the rng is set in the thread local. Any prior rng is stored and replaced when the enter guard drops.

NOTE: This change doesn't correct the handling of the RngSeed on the
thread the runtime is started from, so it's all likely to change. Just
pushing for safety.

Instead of exposing the width of the seed we're currently using, expose
an opaque struct which can generate a seed from a byte slice.

Additionally, each thread is given with a unique seed based on its `id`
(`worker_thread_index`). This should enable more deterministic behavior,
while ensuring that each thread does not begin with the same seed.
In order to properly clean up after ourselves, we set a specific seed
into the thread local RNG when entering into a runtime context. The
previous seed (RNG state) is stored in the `EnterGuard` together with
the previous context (runtime handle). Upon dropping the guard, the
previously stored seed is returned to the thread local RNG.

To achieve this in a deterministic, but fair way, we now store a seed
generator in the runtime handle, and another in the blocking thread
spawner. These seed generators are thread safe (as the one in the handle
may be passed across thread boundries) and will produce a deterministic
series of seeds when the initial seed provided to the seed generator is
the same.
@hds
Copy link
Contributor Author

hds commented Aug 18, 2022

OK, that was a fun rabbit hole I ended up running down following the Handle:enter pattern. Thanks for that. (-;

We now have a public RngSeed which can be created from a byte slice, and also a RngSeedGenerator which is stored on the runtime handle and the spawner for the BlockingPool in order to seed the RNG for the runtime and the blocking threads respectively.

For the runtime, we do set the seed when entering the runtime and keep the old seed in the EnterGuard (not sure how happy everyone is to extend the size and functionality of that struct). When the guard is dropped we reset the RNG back to the state it had prior to entering the runtime. Because the runtime may be accessed in parallel from multiple threads, we make no attempt to update its handle with the state of local RNG upon dropping the enter guard. Instead, each time a thread "enters" a runtime, the next seed from the seed generator is used. This should provide the necessary deterministic behavior, although there are caveats such as that calling select! twice within a single call to block_on is not equivalent to calling select! within two sequential calls to block_on.

Some questions:

  • Do we want to make this API unstable initially (perhaps not a bad idea)?
  • Should this functionality be dependent on the macros feature (the only place its being used currently - this affects how I fix the failing CI jobs)?
  • Do we want to extend the deterministic seeding to the RNG used to pick a random neighbor worker to steal work from (would not be a big change)?

@hds hds marked this pull request as ready for review August 19, 2022 21:31
@hds
Copy link
Contributor Author

hds commented Sep 6, 2022

@carllerche I think I've covered everything in your comments and added the necessary documentation, so this PR is ready for review again when you've got a moment.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed it, and it looks fine to me. I would suggest keeping the new APIs as unstable initially as we let consumers try out the API. To do this, I would just make the public APIs unstable, the implementation can stay as it is.

tokio/src/runtime/builder.rs Outdated Show resolved Hide resolved
tokio/src/runtime/mod.rs Outdated Show resolved Hide resolved
In order to test out the new API before fixing on it, make it unstable
first.
@hds
Copy link
Contributor Author

hds commented Sep 7, 2022

I've made the new API unstable, now just waiting to see whether I've got the right visibility everywhere to satisfy clippy on stable and unstable builds.

@carllerche
Copy link
Member

Is this ready to go? It looks like there are a few merge conflicts now.

@hds
Copy link
Contributor Author

hds commented Sep 15, 2022

Yep, it's ready to go once approved (and now once I fix the merge conflicts). I'll try to merge master in it this afternoon.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of suggestions that you can apply if you want.

tokio/src/util/mod.rs Outdated Show resolved Hide resolved
tokio/src/runtime/builder.rs Outdated Show resolved Hide resolved
@hds hds merged commit b5709ba into master Sep 16, 2022
@hds hds deleted the hds/runtime-builder-rng-seed branch September 16, 2022 15:41
dbischof90 pushed a commit to dbischof90/tokio that referenced this pull request Oct 1, 2022
The `tokio::select!` macro polls branches in a random order. While this
is desirable in production, for testing purposes a more deterministic
approach can be useul.

This change adds an additional parameter `rng_seed` to the runtime
`Builder` to set the random number generator seed. This value is then
used to reset the seed on the current thread when the runtime is entered
into (restoring the previous value when the thread leaves the runtime). All
threads created explicitly by the runtime also have a seed set as the
runtime is built. Each thread is set with a seed from a deterministic
sequence.

This guarantees that calls to the `tokio::select!` macro which are
performed in the same order on the same thread will poll branches in the
same order.

Additionally, the peer chosen to attempt to steal work from also uses a
deterministic sequence if `rng_seed` is set.

Both the builder parameter as well as the `RngSeed` struct are marked
unstable initially.
hds added a commit that referenced this pull request Oct 4, 2022
The original tests for the `Builder::rng_seed` added in #4910 were a bit
fragile. There have already been a couple of instances where internal
refactoring caused the tests to fail and need to be modified. While it
is expected that internal refactoring may cause the random values to
change, this shouldn't cause the tests to break.

The tests should be more robust and not be affected by internal
refactoring or changes in the Rust compiler version. The tests are
changed to perform the same operation in 2 runtimes created with the
same seed, the expectation is that the values that result from each
runtime are the same.
hds added a commit that referenced this pull request Oct 5, 2022
The original tests for the `Builder::rng_seed` added in #4910 were a bit
fragile. There have already been a couple of instances where internal
refactoring caused the tests to fail and need to be modified. While it
is expected that internal refactoring may cause the random values to
change, this shouldn't cause the tests to break.

The tests should be more robust and not be affected by internal
refactoring or changes in the Rust compiler version. The tests are
changed to perform the same operation in 2 runtimes created with the
same seed, the expectation is that the values that result from each
runtime are the same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime R-loom Run loom tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants