New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetty 12.0.8 seems to leak connection when it encounters earlyEOF #11679
Comments
Just to add to what I found; from a heap dump I can see Eclipse MAT is (this log is from heap-hero that seems to be built on MAT) 2,971 instances of "org.eclipse.jetty.io.SocketChannelEndPoint", loaded by "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x6ac006a40" occupy 361,225,312 (75.72%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by "" At the point of heap dump there is certainly not that many real active requests. |
@LarsKrogJensen can you share the heap dump? |
I don't think so as it contains various sensitive data |
If I could figure out a way to reproduce earlyEOF I would in a separate standalone app it would ofc be fine. How do you verify this scenario? any tooling |
not sure sharing screenshots help |
If you can write a reproducer, will be great. Otherwise, please detail exactly how you configured your application. What |
I will for sure continue trying to create a reproducer, already spent many hours without success. I raised this issue cause I was hoping you would be able to spot something in the code that explains it. @sbordet Application is using jetty and jersey to provide a rest api, otherwise it is framework-free™. Handler chain obtained from Server.dump(): GZipHandler |
@sbordet I sanitized the heap dump with https://github.com/paypal/heap-dump-tool and got clearance to share it with you privately. Dump is 500M and ziped 50M. |
@LarsKrogJensen email it to sbordet@webtide.com, let's see if it's not too big. |
@LarsKrogJensen I wrote a test case with Jersey that tries to reproduce your issue. I get a stack trace that is almost identical to yours, but everything works fine: the connection is closed on the server, and the server endPoint removed. I analyzed the heap dump, and there are few things I noted:
We suspect your problems have to do with virtual threads. That would explain why the If you want to go to the bottom of this issue, you should take a thread dump when your system is again in a state similar to the one captured with the heap dump, and let's see what it shows. Then it's going to be a chase to understand what virtual threads are doing, as explained in the Otherwise, I suggest that you disable virtual threads, and try your system again, and see if you have the same problems. Let us know. |
Thanks for thorough explanations, I will continue to try to create a standalone reproducer on my side. VThread pinning is indeed a pita, have been monitoring it with jfr recordings to examine any > 20ms pinnings and found nothing. So I am a bit surprised of this finding and need to dig deeper. I did a jcmd thread dump under similar scenario, but not at same time as the head dump. Will send it to you. Will keep this issue open for a few more days so that I can do a bit more investigations. |
Forgot to mention, we been running with jetty 12 and virtual threads for a couple of months now and it has been performing really, really well, even under really heavy load. |
Do you have a comparison with platform threads? Can you detail what do you mean by "well"? Less latency, more throughput, less memory used, less CPU, etc? |
I don't have any direct comparison with platform threads, the feedback here is that jetty works really, really well with virtual threads. |
Thanks for the feedback. If you have more in the future, we are very interested in hearing Jetty user's experience with virtual threads. I analyzed the thread dump, and unfortunately it does not have much information. For example, Also, there is no locking information, so it is not possible to understand if a virtual thread is pinned by a call to a library. Surely the virtual threads that call Kafka are pinned in The thread dump shows 23 virtual threads parked, mostly in What are the conditions you took this thread dump? Could be that this virtual thread analysis is a red herring and not the actual problem. However, the heap dump evidence seems to indicate that many responses are not even written, which hints at threads blocked, of which there is evidence in the thread dump (although only 23). Would you be able to write a "timeout" If the timeout triggers, then it's a thread blocked problem. I would still also try without virtual threads and see how things go. |
No it was not at any of the dumps A timer is an interesting idea, was looking for if there are any handlers out-of-the-box that could terminate request not responded with in certian time, closes match was a DosFilter that does a lot of other things as well. Will explore replacing virtual threads when running locally on my develop laptop (12 core) and see if it makes any difference. So far no luck in standalone reproducer though Sharing that reproducer if anyone would read this in the future: public void testEarlyEOF() throws Exception {
final String body = """
{
"a":"b",
}
""";
try (Socket socket = new Socket("localhost", 15030)) {
var eoln = System.lineSeparator();
var header = BufferUtil.toBuffer(
"POST /hello HTTP/1.1" + eoln +
"Content-Type: application/json" + eoln +
"Content-Length: 170" + eoln +
"Host: localhost" + eoln);
var bdy = BufferUtil.toBuffer(eoln + body);
socket.getOutputStream().write(header.array());
for (byte b : bdy.array()) {
socket.getOutputStream().write(b);
socket.shutdownOutput();
break;
}
HttpTester.Input input = HttpTester.from(socket.getInputStream());
HttpTester.Response response = HttpTester.parseResponse(input);
System.err.printf("%s %s %s%n", response.getVersion(), response.getStatus(), response.getReason());
for (HttpField field : response) {
System.err.printf("%s: %s%n", field.getName(), field.getValue());
}
System.err.printf("%n%s%n", response.getContent());
}
} |
@LarsKrogJensen ok the ball is in your court then. Let us know if you can reproduce the issue, or if you have more information, or if you have more evidence after writing the "timeout" |
Yep, I have the ball. Tried with platform threads and that did not help (in local dev setup). To mitigate issues in production I added AWS WAF to our ALB and that seems to handle misbehaving clients well so now I no longer see an increase of active requests (and log errors are gone) Now I can get some night sleep again. Still working on a standalone reproducer, but not able to fully replicate. I will reach out when I found more, thanks for outstanding help. |
What does that mean exactly? If so, I think your best bet (pun intended!) is now the "timeout" I saw many Note that even if the there is a shorter idle timeout for connections, for HTTP requests that are processed in async mode (via So if your processing is to call some external service via |
In local dev setup I start app and send single 'bad' request and it triggers the issue, every time. Stacktrace is logged but request never completes, i.e. client simply waits. In standalone reproducer I can get almost identical error stack trace, but then jetty properly return an error response so I am trying to figure out the differences. The request never hits the resource method. |
@sbordet Change that made the difference is that I added a jersey exception mapper handling any exception: public static class CatchAllExceptionMapper implements ExceptionMapper<Exception> {
@Override
public Response toResponse(Exception ex) {
return Response.status(SERVICE_UNAVAILABLE)
.entity("service unavailable")
.type(MediaType.TEXT_PLAIN).build();
}
} |
@sbordet Start JettyApp then run Client and you should see that Client hangs while a stack trace is logged in JettyApp |
@sbordet |
@LarsKrogJensen I went ahead and cleaned up your test project a bit. https://github.com/joakime/jetty-issue-11679 See https://github.com/joakime/jetty-issue-11679/blob/main/src/test/java/BehaviorTests.java The setup of your raw request setup isn't 100% correct - the Content-Length and body setup are not technically correct (that length is number of bytes, not length of string, eol on HTTP/1 is always "\r\n", and you can use new java text blocks to accomplish that). And I parameterized the different ways to send the request in the unit test. |
ok, looks like a nice test improvement and partial_write seems to still fail |
From the server point of view, that "early EOF" is 100% correct, the input was closed (see Then this is seen by the Jersey's implementation of a Reader that is providing content to the Jackson/json layer. Now, as to the reported connection leak, that's still unresearched. |
I pushed a few more updates into that test project fork. I can confirm that the PARTIAL_WRITE_CLOSE_OUTPUT does show a leaked connection on the Server side.
Interestingly, it seems to no longer be updating it's Idle Timeout, like it's stuck at |
Aha, so I might also face a Jersey issue :(
Yeah, aware it's not fully correct, but that's kind of part of the scenario to simulate misbehaving client; it might be evil or simply a case of unreliable network. |
@LarsKrogJensen we analyzed the issue. At the code, it is a Jersey issue.
Due to the fact that we have received an early EOF, trying to write to the Jetty's The escaped exception is then logged but nothing more is done by Jersey; instead, it should have completed the Failing to complete the Since the processing is not complete, the To be fair, we could produce a subclass of |
@LarsKrogJensen would you like to open a Jersey issue and point it to this discussion? |
…rs earlyEOF. Changed HttpConnection.RequestHandler.earlyEOF() to produce EofException instead of BadMessageException, as it is more appropriate. Changed handling of HttpChannelState.onFailure() to not fail the write side unless there is a pending write callback. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
I can open a ticket in Jersey project |
…rs earlyEOF. (#11719) * Fixes #11679 - Jetty 12.0.8 seems to leak connection when it encounters earlyEOF. * Changed HttpConnection.RequestHandler.earlyEOF() to produce EofException instead of BadMessageException, as it is more appropriate. * Changed handling of HttpChannelState.onFailure() to not fail the write side unless there is a pending write callback. * Early EOF events now produce a EofException that is also an HttpException. * Now failures only impact pending writes, so that it would be possible to write an HTTP error response. --------- Signed-off-by: Simone Bordet <simone.bordet@gmail.com> Co-authored-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
Jetty version(s)
12.0.8
Jetty Environment
ee10
Java version/vendor
(use: java -version)
java 21.0.2 2024-01-16 LTS
Java(TM) SE Runtime Environment Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30, mixed mode, sharing)
OS type/version
Ubuntu 22.04
Description
We have seen a clear increase of requests from mobile clients where jetty fails to handle/parse the request body and causes HttpConnection$RequestHandler.earlyEOF to trigger.
That in itself is perhaps not alarming, but it seems that this causes Jetty to leak requests and connection, we can see a steady increase in our metrics of QoSHandler's active requests. Eventually QoS hits the max active requests and starts rejecting requests.
We have a very strong correlation between these errors and leaking connections/requests.
Other sites where we do not see these errors we do not see any leakage.
How to reproduce?
I am afraid I have not been able to reproduce to make jetty trigger HttpParser.earlyEOF, not sure how to simulate a bad client :(
Stacktrace
message: "Error while closing the output stream in order to commit response"
stacktrace:
The text was updated successfully, but these errors were encountered: