AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478

A6blpka · 2024-02-28T15:08:44Z

NLog version: 5.1.2 (and 5.2.8 оnly in a test environment)
Platform: .NET6 - 6.0.25 (and 6.0.27 оnly in a test environment)
Current NLog config

"default-wrapper": { 
  "type": "AsyncWrapper",
  "overflowAction": "Block",
}

and a few file targets (about 10) and one target for console output.

We got a suspension of a web application after the Trace logging level was enabled and some time had passed.
Trace level logging was enabled to resolve application issues, the logging level is not directly related to this issue, but I note it as the fact that the amount of logs the application was generating increased.

After investigation, we have come to the following conclusions:
There was a decrease in disk I/O for a short period of time, but it caused AsyncWrapper to reach its limit in the queue.
Since this is an asp .net application, once the request is received, the framework logs the fact that it started, and gets into lockdown.
Starts creating threads by the thread pool, which are given to new requests and also go into lock, and for the Timer from AsyncTargetWrapper does not get them.
Downtime of the application ended with a memory dump where 600+ threads from the thread pool were waiting here:

OS Thread Id: 0x5db4
        Child SP               IP Call Site
000000F00707ECB8 00007fff56fb1274 [HelperMethodFrame_1OBJ: 000000f00707ecb8] System.Threading.Monitor.ObjWait(Int32, System.Object)
000000F00707EDE0 00007FFED99074D7 NLog.Targets.Wrappers.ConcurrentRequestQueue.WaitForBelowRequestLimit() [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 147]
000000F00707EE50 00007FFED22A3140 NLog.Targets.Wrappers.ConcurrentRequestQueue.Enqueue(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 94]
000000F00707EEB0 00007FFED22A1E36 NLog.Targets.Wrappers.AsyncTargetWrapper.Write(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 433]
000000F00707EF00 00007FFED22A1D61 NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 451]

Afterwards, the application was restarted.
And there are probably more such threads, but the memory dump is broken and we can't work with it properly.
The situation itself seems a bit absurd. After minor problems with disk I/O we have the following:

I/O free
CPU free
RAM - unchanged (very slow growth on creation of new threads)

I have created a similar situation in this example:

var logFactory = GeneralLoggerFactory.GetLoggerFactory();

var logger = logFactory.CreateLogger("test");
var tasks = new List<Task>(101);
var tmp = new ConcurrentDictionary<int, int>();

new Thread(() =>
{
    while (true)
    {
        Console.WriteLine(DateTime.Now.ToString("T") + " Threads worked: " + tmp.Count + " Thread pool thread count: " + ThreadPool.ThreadCount);
        tmp.Clear();
        Thread.Sleep(1000);
    }

}).Start();

ThreadPool.SetMaxThreads(30, 30);

for (var i = 0; i < 100; i++)
    tasks.Add(Task.Run(() =>
        {
            while (true)
            {
                logger.LogInformation("Test");
                tmp.AddOrUpdate(Thread.CurrentThread.ManagedThreadId, 1, (_, value) => ++value);
            }
        }));

await Task.WhenAll(tasks);

Which causes the threads of a thread pool to stop flowing completely after its limit is exhausted.
I think the main problem is that the timer uses thread pool resources. So in any situation when the queue has reached its limit and there are no free threads left in the thread pool (but we have not reached the thread pool limit yet), it will be very unproductive to clear the queue by the timer.

In my situation, instead of a timer, I would prefer to see even dedicated threads, without a thread pool, to ensure fast queue clearing. 10 dedicated threads is cheaper than a downtime application :)

The text was updated successfully, but these errors were encountered:

welcome · 2024-02-28T15:08:48Z

Hi! Thanks for opening your first issue here! Please make sure to follow the issue template - so we could help you better!

snakefoot · 2024-02-28T17:11:49Z

If you start more threads than you have CPU-cores, and all threads are logging to the same NLog AsyncTargetWrapper using overflowAction = Block. Then you will experience thread-starvation and lock-congestion, when the wrapped-target cannot keep up. Leading to a situation where too many threads are fighting for a limited number of cpu-cores, and all cpu-time is spend on lock-congestion and nothing else.

The normal fix is not starting more threads than you have CPU-cores, so setting up an inbound-request-throttle, and limit the max number of active requests to Environment.ProcessorCount * 2 (Maybe adjusted based on the whether your request are cpu-bound by using in-memory-caches or io-bound by using database-queries).

When you a producing more output than the NLog target can handle, then you have to make a choice for overflowAction:

Discard - The application-thread will discard logevent to avoid becoming blocked or causing out-of-memory issues. Consider increasing queueLimit to 100.000 if memory is not an issue. (Default)
Block - The application-thread will block until the background-writer-thread has taken the next batch. Avoids loosing important logevents, but can block all application-threads. Consider to throttle active thread count, by limiting number of concurrent inbound requests.
Grow - The application-thread will ignore the queueLimit, and will just allocate more memory. Can cause the entire application to experience out-of-memory-issues.

But if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome.

Notice NLog v5.2.5 tries to improve the congestion-lock-situation by reducting spin-count when overflowAction = Block: #5344

See also: #4338 + #2628

A6blpka · 2024-02-28T17:34:52Z

I wrote about the idea above. Are you open to looking at an alternative to a timer linked to a thread pool in AsyncWrapper?
Or maybe the realization of a new target?

snakefoot · 2024-02-28T17:39:27Z

The timer-events are scheduled on the dotnet-threadpool. So far you have only presented a single callstack:

000000F00707ECB8 00007fff56fb1274 [HelperMethodFrame_1OBJ: 000000f00707ecb8] System.Threading.Monitor.ObjWait(Int32, System.Object)
000000F00707EDE0 00007FFED99074D7 NLog.Targets.Wrappers.ConcurrentRequestQueue.WaitForBelowRequestLimit() [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 147]
000000F00707EE50 00007FFED22A3140 NLog.Targets.Wrappers.ConcurrentRequestQueue.Enqueue(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 94]
000000F00707EEB0 00007FFED22A1E36 NLog.Targets.Wrappers.AsyncTargetWrapper.Write(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 433]
000000F00707EF00 00007FFED22A1D61 NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 451]

Which are the application-threads waiting for the async-writer-timer to keep up. But the async-writer-timer will execute as often as possible, but when using overflowAction = Block then it will have to fight with the blocked-thread to wake them.

Again if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome.

A6blpka · 2024-02-28T17:45:18Z

The timer-events are scheduled on the dotnet-threadpool. So far you have only presented a single callstack

Because most of them are the same, and they look like this:

OS Thread Id: 0x6578
        Child SP               IP Call Site
000000F03CCFE788 00007fff56fb1274 [HelperMethodFrame_1OBJ: 000000f03ccfe788] System.Threading.Monitor.ObjWait(Int32, System.Object)
000000F03CCFE8B0 00007FFED99074D7 NLog.Targets.Wrappers.ConcurrentRequestQueue.WaitForBelowRequestLimit() [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 147]
000000F03CCFE920 00007FFED22A3140 NLog.Targets.Wrappers.ConcurrentRequestQueue.Enqueue(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/ConcurrentRequestQueue.cs @ 94]
000000F03CCFE980 00007FFED22A1E36 NLog.Targets.Wrappers.AsyncTargetWrapper.Write(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 433]
000000F03CCFE9D0 00007FFED22A1D61 NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Wrappers/AsyncTargetWrapper.cs @ 451]
000000F03CCFEA20 00007FFED22A1AD3 NLog.Targets.Target.WriteAsyncLogEvent(NLog.Common.AsyncLogEventInfo) [/_/src/NLog/Targets/Target.cs @ 328]
000000F03CCFEAA0 00007FFED22A1CD9 NLog.LoggerImpl.WriteToTargetWithFilterChain(NLog.Targets.Target, NLog.Filters.FilterResult, NLog.LogEventInfo, NLog.Common.AsyncContinuation) [/_/src/NLog/LoggerImpl.cs @ 157]
000000F03CCFEB00 00007FFECA36AFD3 NLog.LoggerImpl.Write(System.Type, NLog.Internal.TargetWithFilterChain, NLog.LogEventInfo, NLog.LogFactory) [/_/src/NLog/LoggerImpl.cs @ 99]
000000F03CCFEB60 00007FFED22C5AFA NLog.Logger.WriteToTargets(System.Type, NLog.LogEventInfo, NLog.Internal.TargetWithFilterChain) [/_/src/NLog/Logger.cs @ 758]
000000F03CCFEBB0 00007FFED8837F5C ILStubClass.IL_STUB_InstantiatingStub(Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, System.Object, System.Exception, System.Func`3<System.Object,System.Exception,System.String>)
000000F03CCFEBF0 00007FFED99E2615 Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|12_0[[System.__Canon, System.Private.CoreLib]](Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, Microsoft.Extensions.Logging.ILogger, System.Exception, System.Func`3<System.__Canon,System.Exception,System.String>, System.Collections.Generic.List`1<System.Exception> ByRef, System.__Canon ByRef) [/_/src/libraries/Microsoft.Extensions.Logging/src/Logger.cs @ 44]
000000F03CCFEC60 00007FFED99E253A Microsoft.Extensions.Logging.Logger.Log[[System.__Canon, System.Private.CoreLib]](Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, System.__Canon, System.Exception, System.Func`3<System.__Canon,System.Exception,System.String>) [/_/src/libraries/Microsoft.Extensions.Logging/src/Logger.cs @ 26]
000000F03CCFED20 00007FFED8837EFC ILStubClass.IL_STUB_InstantiatingStub(Microsoft.Extensions.Logging.LogLevel, Microsoft.Extensions.Logging.EventId, System.Object, System.Exception, System.Func`3<System.Object,System.Exception,System.String>)
000000F03CCFED60 00007FFED99FB79E Microsoft.AspNetCore.Hosting.HostingApplicationDiagnostics.LogRequestStarting(Context) [/_/src/Hosting/Hosting/src/Internal/HostingApplicationDiagnostics.cs @ 207]
000000F03CCFEDD0 00007FFED99FA8D1 Microsoft.AspNetCore.Hosting.HostingApplicationDiagnostics.BeginRequest(Microsoft.AspNetCore.Http.HttpContext, Context) [/_/src/Hosting/Hosting/src/Internal/HostingApplicationDiagnostics.cs @ 109]
000000F03CCFEE60 00007FFED99FA165 Microsoft.AspNetCore.Hosting.HostingApplication.CreateContext(Microsoft.AspNetCore.Http.Features.IFeatureCollection) [/_/src/Hosting/Hosting/src/Internal/HostingApplication.cs @ 83]
000000F03CCFEEC0 00007FFED99F5D06 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__226`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
000000F03CCFF070 00007FFED99F53CD System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__226`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequests>d__226`1<System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
000000F03CCFF0F0 00007FFED99F52B3 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
000000F03CCFF1A0 00007FFED99F4B9A Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__225`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
000000F03CCFF250 00007FFED99F49BD System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__225`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__225`1<System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
000000F03CCFF2D0 00007FFED99F48A7 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
000000F03CCFF340 00007FFED99ECE56 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
000000F03CCFF480 00007FFED99EB01D System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__12`1<System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
000000F03CCFF500 00007FFED99EAEF9 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
000000F03CCFF590 00007FFED99E9F7E Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__6[[System.__Canon, System.Private.CoreLib]].MoveNext() [/_/src/Servers/Kestrel/Core/src/Internal/Infrastructure/KestrelConnectionOfT.cs @ 55]
000000F03CCFF670 00007FFED99E990D System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__6[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ExecuteAsync>d__6<System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
000000F03CCFF6F0 00007FFED99E97FA Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].ExecuteAsync()
000000F03CCFF770 00007FFED99E9759 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].System.Threading.IThreadPoolWorkItem.Execute() [/_/src/Servers/Kestrel/Core/src/Internal/Infrastructure/KestrelConnectionOfT.cs @ 38]
000000F03CCFF7A0 00007FFECCEEB562 System.Threading.ThreadPoolWorkQueue.Dispatch() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 729]
000000F03CCFF830 00007FFEDA67C791 System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs @ 107]
000000F03CCFFB80 00007fff29a2ba53 [DebuggerU2MCatchHandlerFrame: 000000f03ccffb80]

snakefoot · 2024-02-28T18:07:55Z

Yes that is the callstack of an application-thread trying to write-logs, but blocked because the async-timer-event cannot keep up. When having too many application-threads being throttled/blocked then suddenly all time is spent on lock-congestion.

The AsyncTargetWrapper avoids creating a dedicated-thread, but relies on the shared dotnet-thread-pool for execution of timer-events.

Again if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome.

snakefoot · 2024-03-11T18:22:38Z

The goal is to reserve one (or more) threads upfront to ensure the AsyncTargetWrapper still functions after dotnet-threadpool-starvation, when using overflowaction=Block.

Maybe the AsyncTargetWrapper should get a ITaskScheduler-option, where one can override the TaskScheduler. Since I believe the use of a Timer-object in AsyncTargetWrapper is a design bug. If needing to throttle then use the BufferingTargetWrapper.

Then build a default TimeScheduler that wraps a standard Timer (just like now, and it handles all timer-throttle logic internally). People can then decide whether to assign a dedicated-TaskScheduler (with 1 thread) or a shared-TaskScheduler (with 1 or more threads that can be shared by all AsyncTargetWrapper-instances). Or a completely custom-TaskScheduler-logic (Ex. IO-Completion-Ports that works independent of DotNet-ThreadPool Starvation).

snakefoot added the question label Feb 28, 2024

A6blpka mentioned this issue Mar 5, 2024

#5478 Implementation in AsyncTargetWrapper of timer type selection. Implementation of timer with dedicated thread. #5481

Closed

snakefoot changed the title ~~Problem with timer in AsyncTargetWrapper~~ AsyncTargetWrapper stops on ThreadPool starvation Mar 9, 2024

snakefoot changed the title ~~AsyncTargetWrapper stops on ThreadPool starvation~~ AsyncTargetWrapper with OverflowAction=Block stops on ThreadPool starvation Mar 9, 2024

snakefoot changed the title ~~AsyncTargetWrapper with OverflowAction=Block stops on ThreadPool starvation~~ AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation Mar 9, 2024

snakefoot added up-for-grabs feature and removed question labels Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478

AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478

A6blpka commented Feb 28, 2024

welcome bot commented Feb 28, 2024

snakefoot commented Feb 28, 2024 •

edited

A6blpka commented Feb 28, 2024 •

edited

snakefoot commented Feb 28, 2024 •

edited

A6blpka commented Feb 28, 2024

snakefoot commented Feb 28, 2024 •

edited

snakefoot commented Mar 11, 2024 •

edited

AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478

AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478

Comments

A6blpka commented Feb 28, 2024

welcome bot commented Feb 28, 2024

snakefoot commented Feb 28, 2024 • edited

A6blpka commented Feb 28, 2024 • edited

snakefoot commented Feb 28, 2024 • edited

A6blpka commented Feb 28, 2024

snakefoot commented Feb 28, 2024 • edited

snakefoot commented Mar 11, 2024 • edited

snakefoot commented Feb 28, 2024 •

edited

A6blpka commented Feb 28, 2024 •

edited

snakefoot commented Feb 28, 2024 •

edited

snakefoot commented Feb 28, 2024 •

edited

snakefoot commented Mar 11, 2024 •

edited