Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 6 Managed ThreadPool / Thread performance and reliability degradations #62967

Closed
Aaronontheweb opened this issue Dec 17, 2021 · 20 comments
Closed
Labels
Milestone

Comments

@Aaronontheweb
Copy link

Description

Since upgrading Akka.NET's test suite to run on .NET 6, we've noticed two major classes of problems that occur on .NET 6 only:

We run the exact same tests and benchmarks on .NET Core 3.1 and .NET Framework 4.7.1 and neither of those platforms exhibit these symptoms (large performance drops, long delays in execution)

Configuration

Machine that was used to generate our benchmark figures:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1348 (2004/May2020Update/20H1)                             
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores                                                  
.NET SDK=6.0.100   

Machine that generates our test failures: windows-2019 shared agent on Azure DevOps.

Regression?

.NET Core 3.1 numbers:

image

.NET 6 numbers:

image

We've observed similar behavior in some of our non-public repositories as well.

Relevant benchmark:

https://github.com/akkadotnet/akka.net/tree/dev/src/benchmark/RemotePingPong

Managed ThreadPool-only benchmark (most relevant for this issue) :

akkadotnet/akka.net#5386

Analysis

We're performing some testing now using the COMPlus_ThreadPool_UsePortableThreadPool environment variable to see how it affects our benchmark and testing behavior (see akkadotnet/akka.net#5441), but we believe that these two issues may be related to the problems we've been observing on .NET 6:

@Aaronontheweb Aaronontheweb added the tenet-performance Performance related issue label Dec 17, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Dec 17, 2021

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Since upgrading Akka.NET's test suite to run on .NET 6, we've noticed two major classes of problems that occur on .NET 6 only:

We run the exact same tests and benchmarks on .NET Core 3.1 and .NET Framework 4.7.1 and neither of those platforms exhibit these symptoms (large performance drops, long delays in execution)

Configuration

Machine that was used to generate our benchmark figures:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1348 (2004/May2020Update/20H1)                             
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores                                                  
.NET SDK=6.0.100   

Machine that generates our test failures: windows-2019 shared agent on Azure DevOps.

Regression?

.NET Core 3.1 numbers:

image

.NET 6 numbers:

image

We've observed similar behavior in some of our non-public repositories as well.

Relevant benchmark:

https://github.com/akkadotnet/akka.net/tree/dev/src/benchmark/RemotePingPong

Managed ThreadPool-only benchmark (most relevant for this issue) :

akkadotnet/akka.net#5386

Analysis

We're performing some testing now using the COMPlus_ThreadPool_UsePortableThreadPool environment variable to see how it affects our benchmark and testing behavior (see akkadotnet/akka.net#5441), but we believe that these two issues may be related to the problems we've been observing on .NET 6:

Author: Aaronontheweb
Assignees: -
Labels:

area-System.Threading, tenet-performance, untriaged

Milestone: -

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 17, 2021
@mangod9 mangod9 added this to the 6.0.x milestone Dec 17, 2021
@mangod9
Copy link
Member

mangod9 commented Dec 17, 2021

Thanks for reporting these issues @Aaronontheweb. We will take a look soon (but might be delayed due to holidays).

cc: @kouvel

@Aaronontheweb
Copy link
Author

@mangod9 @kouvel 100% understood. We've been trying to research this problem on our end before we brought it to the CoreCLR team's attention since we know your time is valuable, scarce, and spread across many projects. We will continue to do that and try to provide helpful information to you and your team.

@mangod9
Copy link
Member

mangod9 commented Dec 17, 2021

sounds good. Yeah please update this issue if you find interesting info. With managed Threadpool changes in .net 6 some pattern differences are expected, but hoping we can remedy them especially the reliability concerns.

@Aaronontheweb
Copy link
Author

What I would ideally like to get out of this issue:

  1. "This is a CLR issue that needs to be investigated"
  2. "Your code is out of date with the direction the CLR is headed; here is how you can fix it"

Either outcome would be fine.

@kouvel
Copy link
Member

kouvel commented Jan 10, 2022

We're performing some testing now using the COMPlus_ThreadPool_UsePortableThreadPool environment variable to see how it affects our benchmark and testing behavior (see akkadotnet/akka.net#5441)

@Aaronontheweb, did you get a chance to see if disabling that config option changes anything in your tests?

The portable thread pool implementation is not intended to change any thread pool heuristics, it is mostly just a switch from a native implementation to a managed implementation of a portion of the implementation. As such there could be expected timing differences, and perhaps unexpected behavior differences, but it would be a good start to disable the portable thread pool with that config option as a first step to narrow things down. Disabling the portable thread pool implementation with the above config option would cause the thread pool to work in pretty much exactly the same way as before, not much at all has changed there since 3.1.

@Aaronontheweb
Copy link
Author

We did gather some additional evidence that the rate of thread injection might be interfering with our dedicated thread we use for scheduling (hashed wheel timer running on a fixed interval) - and this issue becomes more prevalent on machines with low vCPU counts (such as AzDo agents,) but we need to pick up research on this again. Things slowed down a bit over the holidays.

Is there a specific PerfCounter or ETW event I can look for during profiling to compare the rate of thread injection on .NET Core 3.1 vs. .NET 6 with some of our affected benchmarks?

@kouvel
Copy link
Member

kouvel commented Jan 12, 2022

I see. One difference is that the portable thread pool implementation uses Environment.ProcessorCount for the default min worker/IOCP threads, and in .NET 6 on Windows Environment.ProcessorCount takes into account CPU limits (see #52690). A quick check between .NET 3.1 and .NET 6 to see what Environment.ProcessorCount is returning might indicate the difference. With fewer vCPUs, the min thread counts may be lower by default, and that may be a cause, perhaps leading to starvation due to blocking work in thread pool threads (depending on what kind of blocking work is occurring, #53471 may or may not have any effect).

The value returned by Environment.ProcessorCount can be overridden in .NET 6 by setting the following environment variable to the desired value before starting the process:

  • DOTNET_PROCESSOR_COUNT=8

It may be useful to see the following event to see if starvation is occurring:

  • Microsoft-Windows-DotNETRuntime/ThreadPoolWorkerThreadAdjustment/Adjustment

It indicates starvation if the event data contains Reason="Starvation".

If on the other hand thread scheduling is affecting the dedicated thread from getting scheduled and you're looking to determine how many thread pool threads are active at a given time, there isn't an event for that, but it can be determined in code periodically by subtracting the values gotten from ThreadPool.GetAvailableThreads, from the corresponding values gotten from ThreadPool.GetMaxThreads. That would tell how many worker or IOCP threads were processing work at that time. A ThreadTime trace may also help to figure out why threads are not getting scheduled to run quickly enough by seeing the context switch timings.

@Aaronontheweb
Copy link
Author

I've made more data available here: akkadotnet/akka.net#5385 (comment)

Using the values from Process.GetCurrentProcess().Threads.Count, it appears as though there is a consistent drop in throughput under .NET 6 when the ThreadPool begins to pare down the number of available worker threads. We take a before and after snapshot of this value during each iteration of the benchmark.

Both .NET Core 3.1 and .NET 6 start reducing thread counts around the same intervals and the thread counts are roughly the same, but there's no loss of performance when .NET Core 3.1 does this.

@kouvel
Copy link
Member

kouvel commented May 10, 2022

Thread pool worker threads only exit after about 20 seconds of not being used, and thread pool IO completion threads exit after about 15 seconds of not being used. Hill climbing oscillates the worker thread count initially and slowly seems to stabilize near the proc count. So the worker threads that are exiting are just the ones not being used anymore. It's the same behavior on .NET Core 3.1 and .NET 6, so I don't think that would be the cause.

I tried running the benchmark on my 12-proc x64 Windows machine. I'm seeing something similar, though not exactly, and I'm seeing a similar throughput drop in .NET Core 3.1 too sometimes. I'm also seeing a few different multimodal kinds of results on .NET Core 3.1 and .NET 6 similarly. The following results are with hill climbing disabled, I'll explain that below.

.NET Core 3.1

Run 1:

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    12
ClockSpeed:                        0 MHZ
Actor Count:                       24
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      92

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    174979,    1143.33,            92,             110
         5, 1000000,    353357,    2830.20,           118,             118
        10, 2000000,    374954,    5334.19,           126,             126
        15, 3000000,    377359,    7950.59,           134,             125
        20, 4000000,    354705,   11277.24,           134,             118
        25, 5000000,    318533,   15697.97,           126,             107
        30, 6000000,    318810,   18820.77,           117,             107

Run 2:

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    12
ClockSpeed:                        0 MHZ
Actor Count:                       24
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      92

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    173311,    1154.66,            92,             110
         5, 1000000,    316757,    3157.88,           118,             118
        10, 2000000,    329816,    6064.42,           126,             126
        15, 3000000,    332374,    9026.74,           134,             117
        20, 4000000,    338124,   11830.38,           126,             117
        25, 5000000,    333178,   15007.70,           125,             126
        30, 6000000,    328606,   18259.78,           126,             116

.NET 6

Run 1:

OSVersion:                         Microsoft Windows NT 10.0.22000.0
ProcessorCount:                    12
ClockSpeed:                        0 MHZ
Actor Count:                       24
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      92

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    179534,    1114.80,            92,             110
         5, 1000000,    375799,    2661.12,           118,             118
        10, 2000000,    396276,    5047.43,           126,             126
        15, 3000000,    405954,    7390.86,           134,             125
        20, 4000000,    380772,   10505.97,           134,             118
        25, 5000000,    352212,   14196.84,           126,             116
        30, 6000000,    345205,   17381.82,           124,             115

Run 2:

OSVersion:                         Microsoft Windows NT 10.0.22000.0
ProcessorCount:                    12
ClockSpeed:                        0 MHZ
Actor Count:                       24
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      92

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    166806,    1199.58,            92,             110
         5, 1000000,    316958,    3155.22,           118,             118
        10, 2000000,    332724,    6011.42,           126,             126
        15, 3000000,    346021,    8670.62,           134,             125
        20, 4000000,    339070,   11797.64,           133,             115
        25, 5000000,    328796,   15207.72,           124,             108
        30, 6000000,    334617,   17931.24,           116,             116

After seeing some results like that, just to be sure, I also disabled hill climbing (COMPlus_HillClimbing_Disable=1). With that, I didn't see any worker threads exit during the whole run, and only saw about 3 IO completion threads exiting. The thread counts above appear to be decreasing more than that, so that must be caused by something else.

Can you try to disable hill climbing and try with .NET 6 on your machine again to see if the same issue shows up?

It may also be useful to profile a run with PerfView on your machine and see what's happening differently during the time range where throughput is lower in .NET 6 compared with .NET Core 3.1.

@kouvel
Copy link
Member

kouvel commented May 10, 2022

Also tried running the benchmark on a 32-proc Zen 3 Ryzen processor. There I'm seeing some but relatively less variance in the numbers between runs but much more stable during the run at different client numbers, on both .NET Core 3.1 and .NET 6. Throughput seems to be slightly better on .NET 6 on average.

.NET Core 3.1

OSVersion:                         Microsoft Windows NT 6.2.9200.0
ProcessorCount:                    32
ClockSpeed:                        0 MHZ
Actor Count:                       64
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      160

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    281294,     711.44,           160,             196
         5, 1000000,    539666,    1853.69,           210,             226
        10, 2000000,    583601,    3427.86,           237,             244
        15, 3000000,    591134,    5075.64,           252,             252
        20, 4000000,    594001,    6734.27,           260,             248
        25, 5000000,    586717,    8522.88,           257,             241
        30, 6000000,    586224,   10235.76,           249,             234

.NET 6

OSVersion:                         Microsoft Windows NT 10.0.22000.0
ProcessorCount:                    32
ClockSpeed:                        0 MHZ
Actor Count:                       64
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      160

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    309598,     646.31,           160,             195
         5, 1000000,    593472,    1685.15,           212,             227
        10, 2000000,    648299,    3085.03,           235,             246
        15, 3000000,    635863,    4718.67,           254,             259
        20, 4000000,    644227,    6209.82,           267,             258
        25, 5000000,    643087,    7775.79,           267,             251
        30, 6000000,    643156,    9329.07,           259,             240

@Aaronontheweb
Copy link
Author

Thanks for running this!

Can you try to disable hill climbing and try with .NET 6 on your machine again to see if the same issue shows up?

Are the Ryzen numbers with hill-climbing disabled?

@kouvel
Copy link
Member

kouvel commented May 11, 2022

Are the Ryzen numbers with hill-climbing disabled?

No I just ran them with default settings.

@kouvel kouvel modified the milestones: 6.0.x, Future Sep 11, 2022
@andreminelli
Copy link

I don't know if below observations are really related to this issue, but I think it could be helpful anyway: after the update to .Net 6 (without any other code changes), the 'method hotspots' provided by our APM changed quite dramatically.

Now we are always seeing something like this:
image

And previously with .Net core 3.1 - or with .Net 6 but setting COMPlus_ThreadPool_UsePortableThreadPool=0 - it shows:
image

I would like to take the opportunity to ask, just to make sure: there aren't any shortcomings of setting COMPlus_ThreadPool_UsePortableThreadPool=0, right?

@aygalinc
Copy link

aygalinc commented Nov 23, 2022

I dont have a comparison to provide with other runtime but I also observe a lot of contention due to Managed Thread Pool code on macOS, dotNet runtime 6.0.11 :
image

@aygalinc
Copy link

Any news / insight on this subject ?

@LeszekKalibrate
Copy link

is this fixed yet?

@kouvel
Copy link
Member

kouvel commented Mar 15, 2023

@Aaronontheweb it seems this was resolved by #68881 according to akkadotnet/akka.net#5385 (comment). Can this issue be closed, or is it tracking something else?

@Aaronontheweb
Copy link
Author

It can be closed! Thank you!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants