Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test case broken by migration to axum 0.6 #1966

Closed
nsunderland1 opened this issue Apr 27, 2023 · 2 comments
Closed

Test case broken by migration to axum 0.6 #1966

nsunderland1 opened this issue Apr 27, 2023 · 2 comments

Comments

@nsunderland1
Copy link

  • [ x ] I have looked for existing issues (including closed) about this

Bug Report

Version

Working in 0.5.17, broken in 0.6.0

Platform

Amazon Linux 2

Crates

axum

Description

I work on a service built on axum. It uses tower concurrency limiting and load shedding to throttle incoming requests. We have a unit test that validates that the throttling layer is doing its job properly. When I tried migrating our package to axum 0.6, this test broke.

I've constructed a minimal repro of this issue here. This includes an annotated reproduction of our failing test. But basically this is the problem:

We have a piece of state in our tests that holds an atomic counter, shared across requests as an Extension. The counter is incremented at the start of each request and decremented at the end, and if it ever exceeds our configured concurrency limit, we panic, as we interpret this as the concurrency limit layer not having done its job.

The test sets a concurrency limit of 2 and spawns 10 threads which concurrently make requests. We verify that 2 of the requests succeeded and the remaining 8 throttled. In axum 0.5 this test works, but in axum 0.6 we instead get this:

---- tests::admission_control stdout ----
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'assertion failed: cur < CONCURRENCY', src/main.rs:70:17
thread 'tests::admission_control' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(1), ...)', src/main.rs:107:41

You can see the code for more details.

What I think is happening here: we we dispatch the 10 threads, we clone our Router and hand a clone to each thread. It seems that in axum 0.5, cloning routers would result in several instances with shared internal state for concurrency_limit (due to internal Arcing?), but in 0.6 this no longer holds: cloning the router leads to entirely separate instances. That's just my best guess though, I don't know how to verify this hypothesis.

This may not be a bug in axum, but it does seem like there was a breaking change. I couldn't find anything related to this in the changelog.

@nsunderland1
Copy link
Author

nsunderland1 commented May 4, 2023

Did a git bisect which suggests this is the change that broke it: #1552

When I check that commit out, my repro test fails. When I check out the axum commit immediately prior and apply the below diff to my test, the test passes.

diff --git a/src/main.rs b/src/main.rs
index d7825b4..ca7411f 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -76,7 +76,7 @@ mod tests {
 
         let backend = MockBackend::default();
 
-        let app = app(Arc::new(backend), CONCURRENCY);
+        let app = app(Arc::new(backend), CONCURRENCY).into_service();
         let barrier = Arc::new(Barrier::new(10));
 
         // Spawn 10 concurrent tasks, each of which will send a `foo` request

@nsunderland1
Copy link
Author

nsunderland1 commented May 4, 2023

Using .with_state(()) on the router fixes this. Looks like this is this caveat from the PR:

The only downside is that calling a router where with_state has not been called (because it doesn't need any state) doesn't give us a chance to convert everything into Routes ahead of time. That means we need to do it for each request. I haven't measured the perf impact of this but it will at least cause some additional allocations. Router::into_make_service works around this so it shouldn't impact most users but I've still documented it.

My understanding is that this means that we're creating a new instance of each middleware per request. We don't hit this outside tests because we call into_make_service.

Looks like this is documented and there's a fix we can apply (we should really be moving our Extension to State anyway), so I'll resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant