Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OkHttp responses not closed properly #4666

Closed
scholzj opened this issue Dec 12, 2022 · 5 comments · Fixed by #4665
Closed

OkHttp responses not closed properly #4666

scholzj opened this issue Dec 12, 2022 · 5 comments · Fixed by #4665
Assignees
Labels
Milestone

Comments

@scholzj
Copy link
Contributor

scholzj commented Dec 12, 2022

Describe the bug

With 6.3.0 release, it seems that some OkHttp client responses are not closed properly and every 5 minutes, we get bunch of warnings about them in our logs in the Strimzi project For example:

  • From leader election:
    2022-12-12 20:10:17 WARN  OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
    java.lang.Throwable: response.body().close()
        at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleUpdate(OperationSupport.java:366) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleUpdate(BaseOperation.java:709) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$replace$0(HasMetadataOperation.java:167) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:172) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:113) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:41) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.replace(BaseOperation.java:1077) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.replace(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:100) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:214) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$renewWithTimeout$6(LeaderElector.java:175) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$8(LeaderElector.java:258) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
    
  • When creating a custom resource
    2022-12-12 20:35:07,81611 WARN  [OkHttp ConnectionPool] OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
    java.lang.Throwable: response.body().close()
        at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:554) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:347) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:704) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1107) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.strimzi.operator.topic.K8sImpl.lambda$createResource$0(K8sImpl.java:53) ~[io.strimzi.topic-operator-0.33.0-SNAPSHOT.jar:0.33.0-SNAPSHOT]
        at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
    
  • When updating CR status subresource:
    2022-12-12 20:35:07,81724 WARN  [OkHttp ConnectionPool] OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
    java.lang.Throwable: response.body().close()
        at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleUpdate(OperationSupport.java:366) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleUpdate(BaseOperation.java:709) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$replace$0(HasMetadataOperation.java:167) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:172) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replaceStatus(HasMetadataOperation.java:118) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replaceStatus(HasMetadataOperation.java:41) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
        at io.strimzi.operator.common.operator.resource.CrdOperator.lambda$updateStatusAsync$2(CrdOperator.java:132) ~[io.strimzi.operator-common-0.33.0-SNAPSHOT.jar:0.33.0-SNAPSHOT]
        at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
    
  • There seem to be more around list and get operations as well.

This was also discussed here

Fabric8 Kubernetes Client version

6.3.0

Steps to reproduce

Just using the Fabric8 Kubernetes client. If needed, I can try to create some reproducer code snippets. But it seems to come from all kind of places. The stacktraces are in the description.

Expected behavior

No warnings about unclosed messages will appear in the logs.

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

1.25.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

See the description. I can provide more logs if needed.

Additional context

n/a

shawkins added a commit to shawkins/kubernetes-client that referenced this issue Dec 12, 2022
@shawkins shawkins self-assigned this Dec 12, 2022
shawkins added a commit to shawkins/kubernetes-client that referenced this issue Dec 12, 2022
@manusa
Copy link
Member

manusa commented Dec 13, 2022

:( it seems we might have introduced a regression when refactoring our OkHttp HttpClient implementation. We'll need to release a patch with a fix.

@manusa manusa added this to the 6.3.1 milestone Dec 13, 2022
@manusa manusa added the bug label Dec 13, 2022
@manusa
Copy link
Member

manusa commented Dec 13, 2022

Continues discussion in #4660 (reply in thread)

I'm not sure that the handler causing the leakage is the one for the AsyncBody (sendAsync) consumption. It is reported there as a result of the OkHttp's ConnectionPool pruning, but that doesn't mean that the connection was leaked there.

The stack shown above was captured for simply reading the bytes of the response. The handler there will always read to completion, which apparently is not enough for okhttp (it is for the jdk client).

Unfortunately it doesn't appear that the integration tests are long running enough with a single kubernetesclient instance that is using okhttp to highlight this issue, so it was missed.

It does seem that reading the BufferedSource to completion deallocates the connection (RealBufferedSource.exhaused() -> StreamAllocation.streamFinished() -> StreamAllocation.deallocate() -> ConnectionPool.connectionBecameIdle()). So this would be evidence that the leakage is happening elsewhere.

I'm trying to see where else we can be leaking the connection by not closing the response.

@shawkins
Copy link
Contributor

I'm not sure that the handler causing the leakage is the one for the AsyncBody (sendAsync) consumption

There are only 2 calls now. Either one using asyncbody, or a websocket. For there to be an issue with websockets, sendClose would not have to be called - as far as I'm aware non of that logic changed. The change wrt asyncbody is that previously the sendAsync results were directly materialized by okhttp, now they are not.

It is reported there as a result of the OkHttp's ConnectionPool pruning, but that doesn't mean that the connection was leaked there.

Isn't the reporting based upon the stack that was captured for a connection that was leaked?

It does seem that reading the BufferedSource to completion deallocates the connection (RealBufferedSource.exhaused() -> StreamAllocation.streamFinished() -> StreamAllocation.deallocate() -> ConnectionPool.connectionBecameIdle()). So this would be evidence that the leakage is happening elsewhere.

I've confirmed the fix works - that is you can easily reproduce this scenario doing repeated reads. At least with an integration test / real server (if you run against the mock server http1 will be used, I haven't confirmed if it has the same problem). What I see is that when Http2Codec.read is called under exhausted, endOfInput is only called if an IOException is thrown from the FramingSource. If it's just -1 being returned, then endOfInput is not called - so the auto close is not triggered. Some of the http1 logic may be affected as well because I see the same read method in Http1Codec$AbstractSource.read.

@manusa
Copy link
Member

manusa commented Dec 13, 2022

Isn't the reporting based upon the stack that was captured for a connection that was leaked?

The reporting just says that a connection was leaking, not which one (at least this is what I get from reading the code in ConnectionPool.

HTTP 2

I haven't checked with HTTP2, but I've validated that the connection is actually deallocated when read to exhaustion with the mock server (HTTP1).

I'll check with HTTP2 and see if I can provide a more realistic test to verify the fix.

@manusa
Copy link
Member

manusa commented Dec 13, 2022

So it's confirmed, the issue is only present when using the HTTP 2 protocol. The PR (#4665) includes a test that verifies this against a fine-tuned OkHTTP Mock Server.

dongjoon-hyun added a commit to apache/spark that referenced this issue Dec 16, 2022
### What changes were proposed in this pull request?

This PR aims to upgrade K8s client to 6.3.1. I tested manually that the regression is fixed.

- https://github.com/fabric8io/kubernetes-client/releases/tag/v6.3.1
- https://github.com/fabric8io/kubernetes-client/releases/tag/v6.3.0

### Why are the changes needed?

SPARK-41521 tried to use 6.3.0, but it was reverted due to fabric8io/kubernetes-client#4666.
v6.3.1 fixed it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Cis.

Closes #39094 from dongjoon-hyun/SPARK-41552.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants