Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cancel all operations if c.V1Alpha2().Run fails. #8256

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DmitriyMV
Copy link
Member

The current code contains a deadlock if c.V1Alpha2().Run fails with an error because EnforceKSPPRequirements will indefinitely wait for a condition to happen, but the controller runtime is not running. This commit addresses this by replacing the error channel with context.WithCancelCause and canceling the context if the controller runner fails.

@DmitriyMV DmitriyMV force-pushed the fix-controller-fail branch 3 times, most recently from 2036645 to b035f5b Compare February 4, 2024 17:45
@DmitriyMV DmitriyMV force-pushed the fix-controller-fail branch 2 times, most recently from 1f97be3 to 766b06f Compare February 5, 2024 12:14
defer system.Services(c.Runtime()).Shutdown(ctx)
s := system.Services(c.Runtime())
defer func() {
shutdownCtx, shutdownCtxCancel := context.WithTimeout(context.Background(), time.Minute)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required, because when we cancel the context, Shutdown(context) sequence below will never run properly.

@smira
Copy link
Member

smira commented Feb 5, 2024

we should rather fix the indefinite wait if we want to, but this place requires too much testing to get it right in all cases

@DmitriyMV
Copy link
Member Author

we should rather fix the indefinite wait if we want to, but this place requires too much testing to get it right in all cases

@smira are there any valid scenarios where running of sequencer should continue if the runtime failed?

DmitriyMV added a commit to DmitriyMV/talos that referenced this pull request Feb 5, 2024
While we decide what to do with siderolabs#8263 and siderolabs#8256 this quickfix at least allows us to
see what went wrong

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@smira
Copy link
Member

smira commented Feb 5, 2024

we should rather fix the indefinite wait if we want to, but this place requires too much testing to get it right in all cases

@smira are there any valid scenarios where running of sequencer should continue if the runtime failed?

probably not, but runtime failing correctly stops Talos except for the initialize sequence

DmitriyMV added a commit to DmitriyMV/talos that referenced this pull request Feb 5, 2024
While we decide what to do with siderolabs#8263 and siderolabs#8256 this quickfix at least allows us to
see what went wrong

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
The current code contains a deadlock if `c.V1Alpha2().Run` fails with an error because `EnforceKSPPRequirements`
will indefinitely wait for a condition to happen, but the controller runtime is not running.
This commit addresses this by replacing the error channel with `context.WithCancelCause` and
canceling the context if the controller runner fails.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants