You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS type/version
Linux 4.14.181-142.260.amzn2.x86_64 (EKS pod container)
Description
Jetty appears to be using a large amount of CPU (150% of a 1 CPU kubernetes allocation, 2 CPU limit, normal CPU utilization is around 40-50%) when handling requests received from a Unix Domain Socket. It appears as if a single thread is stuck in a tight loop. The number of active connections very slowly increases (up to 20 from 2 over the course of an hour) possibly suggesting an existing connection gets into a bad state once in while. The server is still able to handle requests just fine and appears to be operating normally.
We've traced this down to the jnr-enxio PollSelector where the libc poll call appears to be returning immediately every time. The theory is that the EatWhatYouKill producer/consumer strategy is not using the PollSelector in the way the PollSelector expects. A race condition seems to be occurring in which a file descriptor with data is not being handled by Jetty. FD(s) that always have data ready to be handled could explain why calls to poll never block.
The theory comes from data observed via Java Flight Recorder (A 2 minute profiling run on an affected container). The recording contains a ton of samples like this (the numbers on each line refer to the number of samples in which the trace appears):
Unix Domain Sockets support is experimental and not really recommended - JNR may not be that stable.
The fact that Jetty NIO continuously selects may mean that the JNR implementation returns incorrectly that there is an event for a socket when in reality there is none.
If you can reproduce, DEBUG logs would help understand.
Unix Domain Sockets are ok if you want to experiment - be prepared to bleed.
Jetty version
9.4.28.v20200408
dependencies:
Java version
11.0.7
OS type/version
Linux 4.14.181-142.260.amzn2.x86_64 (EKS pod container)
Description
Jetty appears to be using a large amount of CPU (150% of a 1 CPU kubernetes allocation, 2 CPU limit, normal CPU utilization is around 40-50%) when handling requests received from a Unix Domain Socket. It appears as if a single thread is stuck in a tight loop. The number of active connections very slowly increases (up to 20 from 2 over the course of an hour) possibly suggesting an existing connection gets into a bad state once in while. The server is still able to handle requests just fine and appears to be operating normally.
We've traced this down to the jnr-enxio PollSelector where the libc poll call appears to be returning immediately every time. The theory is that the EatWhatYouKill producer/consumer strategy is not using the PollSelector in the way the PollSelector expects. A race condition seems to be occurring in which a file descriptor with data is not being handled by Jetty. FD(s) that always have data ready to be handled could explain why calls to
poll
never block.The theory comes from data observed via Java Flight Recorder (A 2 minute profiling run on an affected container). The recording contains a ton of samples like this (the numbers on each line refer to the number of samples in which the trace appears):
There were also a bunch of samples that hit these stack traces:
The stack traces above appear in the largest share of the samples taken (excluding calls to
void jdk.internal.misc.Unsafe.park(boolean, long) 21996
).LMK if you need more information.
The text was updated successfully, but these errors were encountered: