Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Stop hooks are executed when Start times out on a long-running Start hook #1061

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

tchung1118
Copy link
Contributor

Currently, none of Stop hooks are run if app.Start fails with a timeout, and app's lifecycle
is stuck on waiting for a long-running Start hook to finish executing. While we cannot run Stop
hook for the long-running Start hook that caused the timeout, we can still run rest of the Stop
hooks for the Start hooks that finished running.

In this PR, this is done by just easing up the requirement for running Stop hooks to include
lifecycle's starting state. Ideally the app's lifecycle state should transition into
incompleteStart when a long-running Start hook causes the context to timeout and does
not respect the context expiration. However, this requires a bigger refactor so that all
lifecycle hook invocations are made asynchronous.

Fixes #1035

@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Merging #1061 (0e25fe4) into master (99d5ce6) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1061   +/-   ##
=======================================
  Coverage   98.52%   98.52%           
=======================================
  Files          39       39           
  Lines        2989     2989           
=======================================
  Hits         2945     2945           
  Misses         37       37           
  Partials        7        7           
Impacted Files Coverage Δ
internal/lifecycle/lifecycle.go 93.54% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sywhang sywhang merged commit 8b301d4 into uber-go:master Mar 28, 2023
6 checks passed
sywhang added a commit to sywhang/fx that referenced this pull request Mar 28, 2023
In uber-go#1061, a fix was made to ensure all Stop hooks whose corresponding
Start hooks ran successfully were run.

This does not fix a case where it's possible for the Start hooks to
race with the Stop hooks. In particular, it's possible for a Start
hook to be in the middle of execution when Fx App bails out of the
execution loop due to Start lifecycle context expiration. If the
Stop hook happens to be running when the Start hook finishes running,
the same hook will be run twice because the l.hooks and l.numStarted
can change in the middle of the Stop hook execution loop.

The fix is quite simple - we need to snapshot this state before going
into the Stop hook execution loop so that the changed state due to the
Start hook that updated the state does not affect the Stop hook execution
loop.

Verified that the attached test fails without the fix made in this PR
with the following error:
```
--- FAIL: TestAppStart (0.00s)
    --- FAIL: TestAppStart/race_test (0.30s)
        app_test.go:1252:
            	Error Trace:	/Users/sungyoon/go/src/github.com/uber-go/fx/app_test.go:1252
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:338
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:300
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:700
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:781
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/asm_arm64.s:1172
            	Error:      	expected each hook to run exactly once only
            	Test:       	TestAppStart/race_test
FAIL

```
sywhang added a commit to sywhang/fx that referenced this pull request Mar 28, 2023
In uber-go#1061, a fix was made to ensure all Stop hooks whose corresponding
Start hooks ran successfully were run.

This does not fix a case where it's possible for the Start hooks to
race with the Stop hooks. In particular, it's possible for a Start
hook to be in the middle of execution when Fx App bails out of the
execution loop due to Start lifecycle context expiration. If the
Stop hook happens to be running when the Start hook finishes running,
the same hook will be run twice because the l.hooks and l.numStarted
can change in the middle of the Stop hook execution loop.

The fix is quite simple - we need to snapshot this state before going
into the Stop hook execution loop so that the changed state due to the
Start hook that updated the state does not affect the Stop hook execution
loop.

Verified that the attached test fails without the fix made in this PR
with the following error:
```
--- FAIL: TestAppStart (0.00s)
    --- FAIL: TestAppStart/race_test (0.30s)
        app_test.go:1252:
            	Error Trace:	/Users/sungyoon/go/src/github.com/uber-go/fx/app_test.go:1252
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:338
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:300
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:700
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:781
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/asm_arm64.s:1172
            	Error:      	expected each hook to run exactly once only
            	Test:       	TestAppStart/race_test
FAIL

```
sywhang added a commit that referenced this pull request Mar 28, 2023
In #1061, a fix was made to ensure all Stop hooks whose corresponding
Start hooks ran successfully were run.

This does not fix a case where it's possible for the Start hooks to race
with the Stop hooks. In particular, it's possible for a Start hook to be
in the middle of execution when Fx App bails out of the execution loop
due to Start lifecycle context expiration. If the Stop hook happens to
be running when the Start hook finishes running, the same hook will be
run twice because the l.hooks and l.numStarted can change in the middle
of the Stop hook execution loop.

The fix is quite simple - we need to snapshot this state before going
into the Stop hook execution loop so that the changed state due to the
Start hook that updated the state does not affect the Stop hook
execution loop.

Verified that the attached test fails without the fix made in this PR
with the following error:
```
--- FAIL: TestAppStart (0.00s)
    --- FAIL: TestAppStart/race_test (0.30s)
        app_test.go:1252:
            	Error Trace:	/Users/sungyoon/go/src/github.com/uber-go/fx/app_test.go:1252
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:338
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/lifecycle.go:300
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:700
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/app.go:781
            	            				/Users/sungyoon/go/src/github.com/uber-go/fx/asm_arm64.s:1172
            	Error:      	expected each hook to run exactly once only
            	Test:       	TestAppStart/race_test
FAIL

```
JacobOaks added a commit that referenced this pull request Mar 30, 2023
#1062 introduced a test for fixing a race condition introduced by #1061.
#1063 tuned this race condition because it was failing on Windows
machines. This PR changes the test to use channels instead of relying on
timing to provide more deterministic reproduction of the race, with the
caveat being that it's not triggered by context timeout or cancel
anymore.

I have verified that this consistently fails pre-#1062 and consistently
passes post-#1062.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Stop hooks are not called when Start times out
2 participants