Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AkkaSystem unclean termination #6948

Open
Zetanova opened this issue Oct 5, 2023 · 8 comments
Open

AkkaSystem unclean termination #6948

Zetanova opened this issue Oct 5, 2023 · 8 comments
Assignees
Milestone

Comments

@Zetanova
Copy link
Contributor

Zetanova commented Oct 5, 2023

Version Information
akka.net 1.5.13

Describe the bug
akka shutdown/termination on fatal situations does not trigger final stop signals and callbacks.
This includes AkkaSystem.WhenTerminated and System.RegisterOnTermination()

This leads to AkkaSystem.WhenTerminated never completes (LivenessHealthCheck: Healthy)
and no ApplicationStop can be triggered over System.RegisterOnTermination(StopApplication).
(Downed node with ReadinessHealthCheck: Unhealthy)

To Reproduce
Terminate and/or CoordinatedShutdown a system in OOM situation.

Expected behavior
The AkkaSystem.RegisterOnTermination should trigger even after an unsuccessful CoordinatedShutdown or AkkaSystem.Terminate()

Actual behavior
The CoordinatedShutdown and/or AkkaSystem.Terminate() are throwing and no AkkaSystem.RegisterOnTermination are executed
and AkkaSystem.WhenTerminated does not complete.

Environment
ubuntu-jammy
docker desktop and k8n

@Aaronontheweb Aaronontheweb added this to the 1.5.14 milestone Oct 11, 2023
Arkatufus added a commit to Arkatufus/akka.net that referenced this issue Oct 23, 2023
Arkatufus added a commit to Arkatufus/akka.net that referenced this issue Oct 23, 2023
@Arkatufus
Copy link
Contributor

I can't really reproduce the bug, maybe there are other things that causes the actor system to fail?

@Aaronontheweb Aaronontheweb removed this from the 1.5.14 milestone Oct 24, 2023
@Aaronontheweb
Copy link
Member

Terminate and/or CoordinatedShutdown a system in OOM situation.

So I missed this - but this is largely an unhandleable situation and CoordinatedShutdown won't run correctly because processes are aborted when this occurs. Catastrophic runtime failures can't be handled gracefully through the normal pathways we use to handle graceful terminations. The solve here is to fix the OOM.

@Aaronontheweb Aaronontheweb closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
@Aaronontheweb
Copy link
Member

See https://learn.microsoft.com/en-us/dotnet/api/system.outofmemoryexception?view=net-7.0 for a fuller explanation on what you can do to log this type of error (Environment.FailFast), but there are no tools to handle it once it gets going.

@Zetanova
Copy link
Contributor Author

Its not that OOM exception should be handled explicitly,
but system.WhenTerminated should be completed even after an exception in system.Terminate() itself.
The callbacks registered in System.RegisterOnTermination(StopApplication) could/should be executed even after an exception in CoordinatedShutdown

If not system.Terminate() and/or CoordinatedShutdown will break the system state and system.WhenTerminated never completes and no System.RegisterOnTermination(StopApplication) callbacks get executed.

My end result was that my LiveHealthCheck on system.WhenTerminated was successful
but the ReadyHealthCheck on the akka cluster was in a failure state. (Kubernetes)

@Zetanova
Copy link
Contributor Author

In my opinion system.Terminate() should have the same behavior as Dispose() that even when it internaly throws the end result of the instance is disposed

@Aaronontheweb
Copy link
Member

but system.WhenTerminated should be completed even after an exception in system.Terminate() itself.
The callbacks registered in System.RegisterOnTermination(StopApplication) could/should be executed even after an exception in CoordinatedShutdown

Ok, that is fixable. We can do that.

@Aaronontheweb Aaronontheweb reopened this Oct 24, 2023
@Aaronontheweb Aaronontheweb added this to the 1.5.14 milestone Oct 24, 2023
@Arkatufus
Copy link
Contributor

@Zetanova can you look at #6967 and check if I missed anything, I could not reproduce your error.

@Aaronontheweb Aaronontheweb modified the milestones: 1.5.14, 1.5.15 Nov 29, 2023
@Aaronontheweb
Copy link
Member

Any updates on this @Zetanova ?

@Aaronontheweb Aaronontheweb modified the milestones: 1.5.15, 1.5.16 Jan 10, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.16, 1.5.17 Jan 31, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.17, 1.5.18 Mar 5, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.18, 1.5.19 Mar 12, 2024
@Aaronontheweb Aaronontheweb removed this from the 1.5.19 milestone Apr 15, 2024
@Aaronontheweb Aaronontheweb added this to the 1.5.20 milestone Apr 15, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.20, 1.5.21 Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants