-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncTargetWrapper with OverflowAction=Block halts on ThreadPool starvation #5478
Comments
Hi! Thanks for opening your first issue here! Please make sure to follow the issue template - so we could help you better! |
If you start more threads than you have CPU-cores, and all threads are logging to the same NLog AsyncTargetWrapper using The normal fix is not starting more threads than you have CPU-cores, so setting up an inbound-request-throttle, and limit the max number of active requests to Environment.ProcessorCount * 2 (Maybe adjusted based on the whether your request are cpu-bound by using in-memory-caches or io-bound by using database-queries). When you a producing more output than the NLog target can handle, then you have to make a choice for overflowAction:
But if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome. Notice NLog v5.2.5 tries to improve the congestion-lock-situation by reducting spin-count when |
I wrote about the idea above. Are you open to looking at an alternative to a timer linked to a thread pool in AsyncWrapper? |
The timer-events are scheduled on the dotnet-threadpool. So far you have only presented a single callstack:
Which are the application-threads waiting for the async-writer-timer to keep up. But the async-writer-timer will execute as often as possible, but when using Again if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome. |
Because most of them are the same, and they look like this:
|
Yes that is the callstack of an application-thread trying to write-logs, but blocked because the async-timer-event cannot keep up. When having too many application-threads being throttled/blocked then suddenly all time is spent on lock-congestion. The AsyncTargetWrapper avoids creating a dedicated-thread, but relies on the shared dotnet-thread-pool for execution of timer-events. Again if you have ideas for how to improve the performance / behavior of the NLog AsyncWrapper-target, then pull-requests are most wellcome. |
The goal is to reserve one (or more) threads upfront to ensure the AsyncTargetWrapper still functions after dotnet-threadpool-starvation, when using overflowaction=Block. Maybe the AsyncTargetWrapper should get a ITaskScheduler-option, where one can override the TaskScheduler. Since I believe the use of a Timer-object in AsyncTargetWrapper is a design bug. If needing to throttle then use the BufferingTargetWrapper. Then build a default TimeScheduler that wraps a standard Timer (just like now, and it handles all timer-throttle logic internally). People can then decide whether to assign a dedicated-TaskScheduler (with 1 thread) or a shared-TaskScheduler (with 1 or more threads that can be shared by all AsyncTargetWrapper-instances). Or a completely custom-TaskScheduler-logic (Ex. IO-Completion-Ports that works independent of DotNet-ThreadPool Starvation). |
NLog version: 5.1.2 (and 5.2.8 оnly in a test environment)
Platform: .NET6 - 6.0.25 (and 6.0.27 оnly in a test environment)
Current NLog config
and a few file targets (about 10) and one target for console output.
We got a suspension of a web application after the Trace logging level was enabled and some time had passed.
Trace level logging was enabled to resolve application issues, the logging level is not directly related to this issue, but I note it as the fact that the amount of logs the application was generating increased.
After investigation, we have come to the following conclusions:
There was a decrease in disk I/O for a short period of time, but it caused AsyncWrapper to reach its limit in the queue.
Since this is an asp .net application, once the request is received, the framework logs the fact that it started, and gets into lockdown.
Starts creating threads by the thread pool, which are given to new requests and also go into lock, and for the Timer from AsyncTargetWrapper does not get them.
Downtime of the application ended with a memory dump where 600+ threads from the thread pool were waiting here:
Afterwards, the application was restarted.
And there are probably more such threads, but the memory dump is broken and we can't work with it properly.
The situation itself seems a bit absurd. After minor problems with disk I/O we have the following:
I have created a similar situation in this example:
Which causes the threads of a thread pool to stop flowing completely after its limit is exhausted.
I think the main problem is that the timer uses thread pool resources. So in any situation when the queue has reached its limit and there are no free threads left in the thread pool (but we have not reached the thread pool limit yet), it will be very unproductive to clear the queue by the timer.
In my situation, instead of a timer, I would prefer to see even dedicated threads, without a thread pool, to ensure fast queue clearing. 10 dedicated threads is cheaper than a downtime application :)
The text was updated successfully, but these errors were encountered: