Cluster startup hard block #5515

Zetanova · 2022-01-15T13:53:18Z

Fixes #5498
Fix startup with missing config for channel-executor

Changes

The constructor thread of cluster startup got reused by the default TaskScheduler used by the ChannelExecutor.

to11mtm · 2022-01-16T18:17:35Z

src/core/Akka.Cluster/Cluster.cs

@@ -140,7 +140,11 @@ public Cluster(ActorSystemImpl system)
            _readView = new ClusterReadView(this);

            // force the underlying system to start
-            _clusterCore = GetClusterCoreRef().Result;
+            // and hard block the current thread
+            var clusterCoreTask = Task.Run(GetClusterCoreRef);


Are there any flags we should consider adding here? i.e. would it be better to do Task.Factory.StartNew with LongRunning and DenyChildAttach?

_clusterCore = GetClusterCoreRef().Result was always locking and I think its still is.

I think what happend is that after the Ask got improved the current thread gets used for the actorcell dispatcher.
I don't know how/where exactly, but the child-actors of cluster-daemon are using it.

´Task.Run(GetClusterCoreRef)´ should force the ask on a different thread,
this is the only thing that matters, that it the current thread does not leak.

I think what happend is that after the Ask got improved the current thread gets used for the actorcell dispatcher.

Interesting. Would there be other implications if this is true?

cc: @Aaronontheweb

The actor system extensions implementation and the locking cluster constructor are both just anti-patterns
and we need to rework it in the future.
#5447

Maybe its just the "ConfigureAwait(false)" of

akka.net/src/core/Akka.Cluster/Cluster.cs

Line 155 in 44c51f6

return await _clusterDaemons.Ask<IActorRef>(new InternalClusterAction.GetClusterCoreRef(this), timeout).ConfigureAwait(false);

The Cluster Extension is called by the ClusterActorRefProvider and it is created in the ActorSystem Startup.
The thread is the same as of the ActorSystem creator and with the normal ForkJoinExecutor they will never mix.
But with the ChannelExecutor that uses the normal ThreadPool of dotnet, the awaiting thread can be used for an ActorCell.

var task = Task.Run(...); task.Wait();

I hope that simply resolve the problem.

task.Result blocked at .net 4.5 for sure,
I don't know why/how the thread gets now reused.

Aaronontheweb · 2022-01-28T13:54:40Z

I have not had time to review this - hopefully I'll be able to come up for air next week.

Arkatufus · 2022-02-03T15:24:16Z

src/core/Akka/Dispatch/AbstractDispatcher.cs

+                {
+                    case "internal-dispatcher":
+                    case "default-remote-dispatcher":
+                        Priority = TaskSchedulerPriority.High;


These default values should be baked in the default HOCON config, not in code; i.e. they have to be transparent to the user, user should only read the config file to figure out the default value of a setting, not dig through the source code.

they are the default in the config.
But the testkit does not load the default config

Arkatufus · 2022-02-03T15:33:24Z

src/core/Akka/Dispatch/AbstractDispatcher.cs

@@ -120,8 +119,26 @@ internal sealed class ChannelExecutorConfigurator : ExecutorServiceConfigurator
    {
        public ChannelExecutorConfigurator(Config config, IDispatcherPrerequisites prerequisites) : base(config, prerequisites)
        {
-            var cfg = config.GetConfig("channel-executor");
-            Priority = (TaskSchedulerPriority)Enum.Parse(typeof(TaskSchedulerPriority), cfg.GetString("priority", "normal"), true);
+            var priorityName = config.GetString("channel-executor.priority", "None") ?? "None";


Unless you're accessing ActorSystem.Settings.Config directly, always assume that a Config object passed into a method can be null. Need to check for null config instance here.

Arkatufus · 2022-02-03T15:38:14Z

src/core/Akka/Dispatch/AbstractDispatcher.cs

@@ -120,8 +119,26 @@ internal sealed class ChannelExecutorConfigurator : ExecutorServiceConfigurator
    {
        public ChannelExecutorConfigurator(Config config, IDispatcherPrerequisites prerequisites) : base(config, prerequisites)
        {
-            var cfg = config.GetConfig("channel-executor");
-            Priority = (TaskSchedulerPriority)Enum.Parse(typeof(TaskSchedulerPriority), cfg.GetString("priority", "normal"), true);
+            var priorityName = config.GetString("channel-executor.priority", "None") ?? "None";


These changes does not have anything to do with making cluster startup to block, is it? Can you remove these and move it to a new PR?

this is required to use the channel-executor in the spec.

I added one spec that simple starts the cluster with the channel-executor and discovered this problems.

The spec itself will not trigger the original failure, because the testkit is using a SynchronisationContext
for the akka system startup

Yes, but this PR can easily be broken into 2 PR, one depending to the other.

This should not be a problem anymore, fixed in #5568

Arkatufus · 2022-02-03T15:38:50Z

src/core/Akka/Dispatch/ChannelSchedulerExtension.cs

-                        config.GetInt("parallelism-min"),
-                        config.GetDouble("parallelism-factor", 1.0D), // the scalar-based factor to scale the threadpool size to 
-                        config.GetInt("parallelism-max"));
+                        config?.GetInt("parallelism-min", 4) ?? 4,


Same thing here, these should be placed in a different PR

Zetanova added 4 commits January 15, 2022 14:37

fix missing config

b94ed29

spec cluster startup channel executor

a1e284e

hard block cluster startup

360a88d

Merge branch 'dev' into cluster-startup-hard-lock

4c3021f

to11mtm reviewed Jan 16, 2022

View reviewed changes

Zetanova mentioned this pull request Jan 27, 2022

Channel executor not injected #5544

Closed

2 tasks

Aaronontheweb assigned Arkatufus Feb 1, 2022

Merge branch 'dev' into cluster-startup-hard-lock

72ee78e

Arkatufus requested changes Feb 3, 2022

View reviewed changes

Merge branch 'dev' into cluster-startup-hard-lock

2619643

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster startup hard block #5515

Cluster startup hard block #5515

Zetanova commented Jan 15, 2022

to11mtm Jan 16, 2022

Zetanova Jan 17, 2022

to11mtm Jan 17, 2022

Zetanova Jan 18, 2022

Zetanova Jan 21, 2022

Aaronontheweb commented Jan 28, 2022

Arkatufus Feb 3, 2022

Zetanova Feb 4, 2022

Arkatufus Feb 3, 2022

Arkatufus Feb 3, 2022

Zetanova Feb 4, 2022

Arkatufus Feb 23, 2022

Arkatufus Feb 23, 2022

Arkatufus Feb 3, 2022

Cluster startup hard block #5515

Are you sure you want to change the base?

Cluster startup hard block #5515

Conversation

Zetanova commented Jan 15, 2022

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aaronontheweb commented Jan 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment