Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetty 12.0.8 seems to leak connection when it encounters earlyEOF #11679

Closed
LarsKrogJensen opened this issue Apr 22, 2024 · 31 comments · Fixed by #11719
Closed

Jetty 12.0.8 seems to leak connection when it encounters earlyEOF #11679

LarsKrogJensen opened this issue Apr 22, 2024 · 31 comments · Fixed by #11719
Labels
Bug For general bugs on Jetty side

Comments

@LarsKrogJensen
Copy link

LarsKrogJensen commented Apr 22, 2024

Jetty version(s)
12.0.8

Jetty Environment
ee10

Java version/vendor (use: java -version)
java 21.0.2 2024-01-16 LTS
Java(TM) SE Runtime Environment Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30, mixed mode, sharing)

OS type/version
Ubuntu 22.04

Description
We have seen a clear increase of requests from mobile clients where jetty fails to handle/parse the request body and causes HttpConnection$RequestHandler.earlyEOF to trigger.
That in itself is perhaps not alarming, but it seems that this causes Jetty to leak requests and connection, we can see a steady increase in our metrics of QoSHandler's active requests. Eventually QoS hits the max active requests and starts rejecting requests.

We have a very strong correlation between these errors and leaking connections/requests.

Other sites where we do not see these errors we do not see any leakage.

How to reproduce?
I am afraid I have not been able to reproduce to make jetty trigger HttpParser.earlyEOF, not sure how to simulate a bad client :(

Stacktrace
message: "Error while closing the output stream in order to commit response"
stacktrace:

org.glassfish.jersey.server.ServerRuntime$Responder: org.eclipse.jetty.http.BadMessageException: 400: Early EOF
	at o.e.j.s.internal.HttpConnection$RequestHandler.earlyEOF(HttpConnection.java:1051)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1729)
	at org.eclipse.jetty.server.internal.HttpConnection.parseRequestBuffer(HttpConnection.java:568)
	at org.eclipse.jetty.server.internal.HttpConnection.parseAndFillForContent(HttpConnection.java:493)
	at o.e.j.s.i.HttpConnection$HttpStreamOverHTTP1.read(HttpConnection.java:1346)
	at org.eclipse.jetty.server.HttpStream$Wrapper.read(HttpStream.java:161)
	at o.e.j.s.internal.HttpChannelState$ChannelRequest.read(HttpChannelState.java:888)
	at org.eclipse.jetty.server.Request$Wrapper.read(Request.java:791)
	at o.e.j.server.handler.EventsHandler$EventsRequest.read(EventsHandler.java:376)
	at org.eclipse.jetty.server.Request$Wrapper.read(Request.java:791)
	at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.readChunk(AsyncContentProducer.java:319)
	at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.produceChunk(AsyncContentProducer.java:305)
	at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.isReady(AsyncContentProducer.java:243)
	at o.e.jetty.ee10.servlet.BlockingContentProducer.nextChunk(BlockingContentProducer.java:108)
	at org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:244)
	at org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:225)
	at o.g.jersey.message.internal.EntityInputStream.read(EntityInputStream.java:79)
	at o.g.j.m.i.ReaderInterceptorExecutor$UnCloseableInputStream.read(ReaderInterceptorExecutor.java:276)
	at c.f.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:547)
	at c.f.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:137)
	at c.f.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:266)
	at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1863)
	at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1262)
	at o.g.j.j.internal.jackson.jaxrs.base.ProviderBase._createParser(ProviderBase.java:843)
	at o.g.j.j.internal.jackson.jaxrs.base.ProviderBase.readFrom(ProviderBase.java:787)
	at o.g.j.m.i.ReaderInterceptorExecutor$TerminalReaderInterceptor.invokeReadFrom(ReaderInterceptorExecutor.java:235)
	at o.g.j.m.i.ReaderInterceptorExecutor$TerminalReaderInterceptor.aroundReadFrom(ReaderInterceptorExecutor.java:214)
	at o.g.j.message.internal.ReaderInterceptorExecutor.proceed(ReaderInterceptorExecutor.java:134)
	at o.g.j.s.i.MappableExceptionWrapperInterceptor.aroundReadFrom(MappableExceptionWrapperInterceptor.java:49)
	at o.g.j.message.internal.ReaderInterceptorExecutor.proceed(ReaderInterceptorExecutor.java:134)
	at o.g.jersey.message.internal.MessageBodyFactory.readFrom(MessageBodyFactory.java:1072)
	at o.g.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:657)
	at org.glassfish.jersey.server.ContainerRequest.readEntity(ContainerRequest.java:290)
	at o.g.j.s.i.i.EntityParamValueParamProvider$EntityValueSupplier.apply(EntityParamValueParamProvider.java:73)
	at o.g.j.s.i.i.EntityParamValueParamProvider$EntityValueSupplier.apply(EntityParamValueParamProvider.java:56)
	at o.g.j.s.spi.internal.ParamValueFactoryWithSource.apply(ParamValueFactoryWithSource.java:50)
	at o.g.j.server.spi.internal.ParameterValueHelper.getParameterValues(ParameterValueHelper.java:68)
	at o.g.j.s.m.i.JavaResourceMethodDispatcherProvider$AbstractMethodParamInvoker.getParamValues(JavaResourceMethodDispatcherProvider.java:109)
	at o.g.j.s.m.i.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
	at o.g.j.s.m.i.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
	at o.g.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
	at o.g.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
	at o.g.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:263)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at o.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:242)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:312)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
	at o.e.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
	at c.k.p.f.rest.filter.RequestContextBindingFilter.doFilter(RequestContextBindingFilter.java:63)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at o.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at o.e.j.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:819)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
	at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:575)
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:851)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:164)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at o.eclipse.jetty.server.handler.CrossOriginHandler.handle(CrossOriginHandler.java:375)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at o.eclipse.jetty.server.handler.ConditionalHandler.nextHandler(ConditionalHandler.java:421)
	at org.eclipse.jetty.server.handler.QoSHandler.handleWithPermit(QoSHandler.java:266)
	at org.eclipse.jetty.server.handler.QoSHandler.onConditionsMet(QoSHandler.java:191)
	at o.eclipse.jetty.server.handler.ConditionalHandler.handle(ConditionalHandler.java:378)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at c.k.p.frontend.ratelimiter.PunterThrottlingHandler.handle(PunterThrottlingHandler.java:36)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at c.k.k.jetty.KelogsTracingHandler.handle(KelogsTracingHandler.java:53)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at org.eclipse.jetty.server.handler.EventsHandler.handle(EventsHandler.java:81)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:597)
	at org.eclipse.jetty.server.Server.handle(Server.java:179)
	at o.e.j.s.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
	at o.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at j.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)
	at java.lang.VirtualThread.run(VirtualThread.java:309)
@LarsKrogJensen LarsKrogJensen added the Bug For general bugs on Jetty side label Apr 22, 2024
@LarsKrogJensen LarsKrogJensen changed the title Jetty 12.08 seems to leak connection when it encounters earlyEOF Jetty 12.0.8 seems to leak connection when it encounters earlyEOF Apr 22, 2024
@LarsKrogJensen
Copy link
Author

Just to add to what I found; from a heap dump I can see Eclipse MAT is (this log is from heap-hero that seems to be built on MAT)

2,971 instances of "org.eclipse.jetty.io.SocketChannelEndPoint", loaded by "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x6ac006a40" occupy 361,225,312 (75.72%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by ""

At the point of heap dump there is certainly not that many real active requests.

@sbordet
Copy link
Contributor

sbordet commented Apr 22, 2024

@LarsKrogJensen can you share the heap dump?

@LarsKrogJensen
Copy link
Author

I don't think so as it contains various sensitive data

@LarsKrogJensen
Copy link
Author

If I could figure out a way to reproduce earlyEOF I would in a separate standalone app it would ofc be fine.

How do you verify this scenario? any tooling

@LarsKrogJensen
Copy link
Author

not sure sharing screenshots help

@sbordet
Copy link
Contributor

sbordet commented Apr 22, 2024

@LarsKrogJensen

If you can write a reproducer, will be great.

Otherwise, please detail exactly how you configured your application.

What Handlers, in what order, their configuration, etc. so that we can try to reproduce the issue as closely as possible as your configuration.

@LarsKrogJensen
Copy link
Author

LarsKrogJensen commented Apr 23, 2024

I will for sure continue trying to create a reproducer, already spent many hours without success. I raised this issue cause I was hoping you would be able to spot something in the code that explains it.

@sbordet
Do you have any unit/integration test that simulates the earlyEOF that I can learn from?

Application is using jetty and jersey to provide a rest api, otherwise it is framework-free™.
Using virtual threads as described in https://webtide.com/jetty-12-virtual-threads-support/

Handler chain obtained from Server.dump():

GZipHandler
StatisticsHandler
CustomTracingHandler (simple Handler.Wrapper implementation)
CrossOriginHandler
CustomRateLimitHandler (simple Handler.Wrapper using Bucket4j to rate limit with redis as backend)
QoSHandler (max requests 4k, max suspend 10sec)
ServlerContextHandler
ConstraintSecurityHandler (limits allowed methods)
ServletHandler (running Jersey ServletContainer)

@LarsKrogJensen
Copy link
Author

@sbordet I sanitized the heap dump with https://github.com/paypal/heap-dump-tool and got clearance to share it with you privately. Dump is 500M and ziped 50M.

@sbordet
Copy link
Contributor

sbordet commented Apr 23, 2024

@LarsKrogJensen email it to sbordet@webtide.com, let's see if it's not too big.

@sbordet
Copy link
Contributor

sbordet commented Apr 23, 2024

@LarsKrogJensen I wrote a test case with Jersey that tries to reproduce your issue.

I get a stack trace that is almost identical to yours, but everything works fine: the connection is closed on the server, and the server endPoint removed.

I analyzed the heap dump, and there are few things I noted:

  • The idle timeout is 30 minutes. This is typically too long, and the reason those connections stay around for such a long time.
  • Many of the Jetty objects that manage responses are in a state where they have not written any response.
  • You are using virtual threads, and 3 of them are pinned into NIO select(). This is not good.
  • The other 4 virtual threads instances in the heap dump have no carrier, and they are blocked on some lock.

We suspect your problems have to do with virtual threads.
We apparently cannot reproduce them because it's not your full system, and you may make calls to code that is not virtual thread safe (and pins the carriers).

That would explain why the QoSHandler would not decrement the permits: the virtual thread entered to serve the request, got blocked and was never resumed (which also explains why the Jetty output objects have not written any response).
Considering that 3 carriers are pinned, and you seem to have configured a parallelism of 4, you have only 1 carrier left (if any at all -- maybe we just did not find where it is pinned).

If you want to go to the bottom of this issue, you should take a thread dump when your system is again in a state similar to the one captured with the heap dump, and let's see what it shows. Then it's going to be a chase to understand what virtual threads are doing, as explained in the Observing virtual threads section of JEP 444: set jdk.trackAllThreads=true and use jcmd <pid> Thread.dump_to_file -format=json <file>.

Otherwise, I suggest that you disable virtual threads, and try your system again, and see if you have the same problems.

Let us know.
We are very interested in knowing interactions of Jetty with virtual threads, which sometimes are very unexpected.

@LarsKrogJensen
Copy link
Author

Thanks for thorough explanations, I will continue to try to create a standalone reproducer on my side.

VThread pinning is indeed a pita, have been monitoring it with jfr recordings to examine any > 20ms pinnings and found nothing. So I am a bit surprised of this finding and need to dig deeper.

I did a jcmd thread dump under similar scenario, but not at same time as the head dump. Will send it to you.

Will keep this issue open for a few more days so that I can do a bit more investigations.

@LarsKrogJensen
Copy link
Author

Forgot to mention, we been running with jetty 12 and virtual threads for a couple of months now and it has been performing really, really well, even under really heavy load.

@sbordet
Copy link
Contributor

sbordet commented Apr 24, 2024

we been running with jetty 12 and virtual threads for a couple of months now and it has been performing really, really well, even under really heavy load

Do you have a comparison with platform threads?
Also platform threads run really, really well?

Can you detail what do you mean by "well"? Less latency, more throughput, less memory used, less CPU, etc?
What exactly is better with virtual threads?

@LarsKrogJensen
Copy link
Author

I don't have any direct comparison with platform threads, the feedback here is that jetty works really, really well with virtual threads.
One observation is that total number of jvm threads decreased a lot, i.e. much less context switching for cpu. I know from your blogs that you have your doubts of v-thread benefits.

@sbordet
Copy link
Contributor

sbordet commented Apr 24, 2024

Thanks for the feedback. If you have more in the future, we are very interested in hearing Jetty user's experience with virtual threads.

I analyzed the thread dump, and unfortunately it does not have much information.

For example, VirtualThread.toString() prints useful information about thread state and carrier, but it's not used in the dump 😠

Also, there is no locking information, so it is not possible to understand if a virtual thread is pinned by a call to a library.

Surely the virtual threads that call Kafka are pinned in EPoll.wait() -- we know that for sure.
You must definitely avoid doing that.

The thread dump shows 23 virtual threads parked, mostly in CompletableFuture.get(), often due to HttpClient calls you are waiting on.
This would be completely normal, unless something in the stack wraps the next call into a synchronized block that would pin the carrier -- unfortunately we do not know.

What are the conditions you took this thread dump?
Was the QoSHandler maxed out at 4k already?

Could be that this virtual thread analysis is a red herring and not the actual problem.

However, the heap dump evidence seems to indicate that many responses are not even written, which hints at threads blocked, of which there is evidence in the thread dump (although only 23).

Would you be able to write a "timeout" Handler that starts a timer when a thread enters, and cancels it when the thread exit? With a short timeout (say 5 seconds) that fires if the request is not handled in a timely fashion?

If the timeout triggers, then it's a thread blocked problem.
Otherwise, it's something else and we need to dig deeper.

I would still also try without virtual threads and see how things go.

@LarsKrogJensen
Copy link
Author

LarsKrogJensen commented Apr 24, 2024

Was the QoSHandler maxed out at 4k already?

No it was not at any of the dumps

A timer is an interesting idea, was looking for if there are any handlers out-of-the-box that could terminate request not responded with in certian time, closes match was a DosFilter that does a lot of other things as well.

Will explore replacing virtual threads when running locally on my develop laptop (12 core) and see if it makes any difference.
Now I can reproduce it every time I try with my reproducer, there is obviously no load on at all.

So far no luck in standalone reproducer though

Sharing that reproducer if anyone would read this in the future:

    public void testEarlyEOF() throws Exception {
        final String body = """
            {
                "a":"b",
            }
            """;
        try (Socket socket = new Socket("localhost", 15030)) {
            var eoln = System.lineSeparator();
            var header = BufferUtil.toBuffer(
                "POST /hello HTTP/1.1" + eoln +
                "Content-Type: application/json" + eoln +
                "Content-Length: 170" + eoln +
                "Host: localhost" + eoln);
            var bdy = BufferUtil.toBuffer(eoln + body);

            socket.getOutputStream().write(header.array());
            for (byte b : bdy.array()) {
                socket.getOutputStream().write(b);
                socket.shutdownOutput();
                break;
            }

            HttpTester.Input input = HttpTester.from(socket.getInputStream());

            HttpTester.Response response = HttpTester.parseResponse(input);
            System.err.printf("%s %s %s%n", response.getVersion(), response.getStatus(), response.getReason());
            for (HttpField field : response) {
                System.err.printf("%s: %s%n", field.getName(), field.getValue());
            }
            System.err.printf("%n%s%n", response.getContent());
        }
    }

@sbordet
Copy link
Contributor

sbordet commented Apr 24, 2024

@LarsKrogJensen ok the ball is in your court then.

Let us know if you can reproduce the issue, or if you have more information, or if you have more evidence after writing the "timeout" Handler, or if you switched back to platform threads and everything works fine, and we'll happily discuss.

@LarsKrogJensen
Copy link
Author

Yep, I have the ball.

Tried with platform threads and that did not help (in local dev setup).

To mitigate issues in production I added AWS WAF to our ALB and that seems to handle misbehaving clients well so now I no longer see an increase of active requests (and log errors are gone) Now I can get some night sleep again.

Still working on a standalone reproducer, but not able to fully replicate.

I will reach out when I found more, thanks for outstanding help.

@sbordet
Copy link
Contributor

sbordet commented Apr 24, 2024

Tried with platform threads and that did not help (in local dev setup).

What does that mean exactly?
You still have the leak and QosHandler maxes out?

If so, I think your best bet (pun intended!) is now the "timeout" Handler.

I saw many CompletableFuture.get() in your thread dump, which may wait forever.

Note that even if the there is a shorter idle timeout for connections, for HTTP requests that are processed in async mode (via request.startAsync() or equivalently using jax-rs @Suspended) the idle timeout will be ignored as per Servlet specification.

So if your processing is to call some external service via HttpClient, and wait for the response with the infinite CompletableFuture.get(), and that service does not respond, or there is a bug and the CompletableFuture is not completed, then you are going to wait forever.
But this case should be easily detected by the "timeout" Handler.

@LarsKrogJensen
Copy link
Author

What does that mean exactly?
You still have the leak and QosHandler maxes out?

In local dev setup I start app and send single 'bad' request and it triggers the issue, every time. Stacktrace is logged but request never completes, i.e. client simply waits.
So the QoSHandler is not maxed out, and I have fixed the thread pinning and set connector idle timeout to 30 seconds.

In standalone reproducer I can get almost identical error stack trace, but then jetty properly return an error response so I am trying to figure out the differences.

The request never hits the resource method.

@LarsKrogJensen
Copy link
Author

@sbordet
Now I have a reproducer and preparing sharing in a GitHub repo

Change that made the difference is that I added a jersey exception mapper handling any exception:

public static class CatchAllExceptionMapper implements ExceptionMapper<Exception> {
        @Override
        public Response toResponse(Exception ex) {
            return Response.status(SERVICE_UNAVAILABLE)
                .entity("service unavailable")
                .type(MediaType.TEXT_PLAIN).build();

        }
    }

@LarsKrogJensen
Copy link
Author

@sbordet
Here is a link to GitHub repo https://github.com/LarsKrogJensen/jetty-issue-11679

Start JettyApp then run Client and you should see that Client hangs while a stack trace is logged in JettyApp

@LarsKrogJensen
Copy link
Author

@sbordet
let me know if you had a chance to reproduce with provided repo

@joakime
Copy link
Contributor

joakime commented Apr 26, 2024

@LarsKrogJensen I went ahead and cleaned up your test project a bit.

https://github.com/joakime/jetty-issue-11679

See https://github.com/joakime/jetty-issue-11679/blob/main/src/test/java/BehaviorTests.java

The setup of your raw request setup isn't 100% correct - the Content-Length and body setup are not technically correct (that length is number of bytes, not length of string, eol on HTTP/1 is always "\r\n", and you can use new java text blocks to accomplish that).
Also, if you look around commit joakime/jetty-issue-11679@6eb9426 you'll see other small improvements (like how to dump the HttpTester.Response details) that you might benefit from in the future.

And I parameterized the different ways to send the request in the unit test.

@LarsKrogJensen
Copy link
Author

ok, looks like a nice test improvement and partial_write seems to still fail

@joakime
Copy link
Contributor

joakime commented Apr 26, 2024

From the server point of view, that "early EOF" is 100% correct, the input was closed (see shutdownInput SocketChannelEndPoint in logging output), and the request body was incomplete (per Content-Length specified), so it has to throw an "early EOF".

Then this is seen by the Jersey's implementation of a Reader that is providing content to the Jackson/json layer.
The Jersey ObjectReader sees this and reports it as an IOException to Jackson/json's parse attempt.
The error is then attempted to be processed by Jersey for producing an error response.
However, that error response cannot be generated / committed, so Jersey produces from org.glassfish.jersey.server.ServerRuntime$Responder saying it "Error while closing the output stream in order to commit response"

Now, as to the reported connection leak, that's still unresearched.

@joakime
Copy link
Contributor

joakime commented Apr 26, 2024

I pushed a few more updates into that test project fork.
Mostly adding idle timeouts.

I can confirm that the PARTIAL_WRITE_CLOSE_OUTPUT does show a leaked connection on the Server side.

|  += ServerConnectorManager@1e530163[keys=1] - STARTED
|  |  += oeji.ManagedSelector@4088741b{STARTED}[id=0 keys=1 selected=0 updates=0 selection:tot=1/avg=1.00/max=1] - STARTED
|  |     += AdaptiveExecutionStrategy@54bca971/SelectorProducer@23706db8/PRODUCING/p=0/QueuedThreadPool[qtp502848122]@1df8da7a{STARTED,8<=8<=200,i=6,r=-1,t=54558ms,q=0}[NO_TRY][pc=0,pic=0,pec=1,epc=0]@2024-04-26T08:53:48.936745618-05:00 - STARTED
|  |     |  +- SelectorProducer@23706db8
|  |     |  +~ QueuedThreadPool[qtp502848122]@1df8da7a{STARTED,8<=8<=200,i=6,r=-1,t=54557ms,q=0}[NO_TRY] - STARTED
|  |     |  +- java.util.concurrent.ThreadPerTaskExecutor@205bed61
|  |     +> updates @ 2024-04-26T08:53:48.93434834-05:00 size=0
|  |     +> keys @ 2024-04-26T08:53:48.935257164-05:00 size=1
|  |        +> SelectionKey@4f994ef{i=0}->SocketChannelEndPoint@5fd6dce1[{l=/192.168.1.215:41503,r=/192.168.1.215:48294,ISHUT,fill=-,flush=-,to=1966/3000}{io=0/0,kio=0,kro=1}]->[HttpConnection@3fd39741[p=HttpParser{s=CLOSED,85 of 170},g=HttpGenerator@a40d6c5{s=START}]=>HttpChannelState@77741508{handling=null, handled=true, send=SENDING, completed=false, request=POST@a989d992 http://192.168.1.215:41503/api/hello HTTP/1.1}]
|  +- sun.nio.ch.ServerSocketChannelImpl[/[0:0:0:0:0:0:0:0]:41503]

Interestingly, it seems to no longer be updating it's Idle Timeout, like it's stuck at to=1966/3000

@LarsKrogJensen
Copy link
Author

Aha, so I might also face a Jersey issue :(

The setup of your raw request setup isn't 100% correct - the Content-Length and body setup are not technically correct

Yeah, aware it's not fully correct, but that's kind of part of the scenario to simulate misbehaving client; it might be evil or simply a case of unreliable network.

@sbordet
Copy link
Contributor

sbordet commented Apr 29, 2024

@LarsKrogJensen we analyzed the issue.

At the code, it is a Jersey issue.
What happens is that Jersey tries to write the response in ServerRuntime.Responder.writeResponse(), which calls response.close() (about line 750), which calls ContainerResponse.close(), which is:

    public void close() {
        if (!closed) {
            closed = true;
            messageContext.close();
            requestContext.getResponseWriter().commit();
            requestContext.setWorkers(null);
        }
    }

messageContext.close() is OutboundMessageContext.close(), which delegates the close to the entity stream, which eventually wraps Jetty's HttpOutput`.

Due to the fact that we have received an early EOF, trying to write to the Jetty's HttpOutput fails and rethrows the BadMessageException produced by the early EOF.
The Jersey code catches IOException, but BadMessageException is a RuntimeException that therefore escapes the close() method, and (see above) does not perform the subsequent 2 calls, in particular it does not perform requestContext.getResponseWriter().commit().

The escaped exception is then logged but nothing more is done by Jersey; instead, it should have completed the AsyncContext to complete the response.

Failing to complete the AsyncContext causes the response processing to never complete, and idle timeouts are ignored as per Servlet specification.

Since the processing is not complete, the QoSHandler does not see that request as completed, so it does not release the permit, eventually causing the build up and leak of requests.

To be fair, we could produce a subclass of IOException instead of BadMessageException in case of early EOF, and that would fix the issue.
We are working on that, but the Jersey bug is there anyway, as RuntimeException can be thrown by close(), so Jersey should catch(Throwable x) and ensure that the AsyncContext is completed.

@sbordet
Copy link
Contributor

sbordet commented Apr 29, 2024

@LarsKrogJensen would you like to open a Jersey issue and point it to this discussion?
Otherwise we'll do it.

sbordet added a commit that referenced this issue Apr 29, 2024
…rs earlyEOF.

Changed HttpConnection.RequestHandler.earlyEOF() to produce EofException instead of BadMessageException, as it is more appropriate.

Changed handling of HttpChannelState.onFailure() to not fail the write side unless there is a pending write callback.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@LarsKrogJensen
Copy link
Author

I can open a ticket in Jersey project

joakime added a commit that referenced this issue May 3, 2024
…rs earlyEOF. (#11719)

* Fixes #11679 - Jetty 12.0.8 seems to leak connection when it encounters earlyEOF.
* Changed HttpConnection.RequestHandler.earlyEOF() to produce EofException instead of BadMessageException, as it is more appropriate.
* Changed handling of HttpChannelState.onFailure() to not fail the write side unless there is a pending write callback.
* Early EOF events now produce a EofException that is also an HttpException.
* Now failures only impact pending writes, so that it would be possible to write an HTTP error response.

---------

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Co-authored-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
3 participants