New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
javax.net.ssl.SSLException: SSLEngine closed already for gateway requests #782
Comments
@venkatnpedada Try to use the latest versions - Reactor Netty 0.8.9.RELEASE, Spring Framework 5.1.8.RELEASE, Spring Boot 2.1.6.RELEASE |
@violetagg This is happening with Reactor-Netty 0.8.9 (BOM: Californium-SR9). I just saw this in our production system
|
@nnanda2016 Is it possible to provide some reproducible scenario/logs/tcp dump |
I am also observing the same issue on Reactor-Netty 0.8.10. It is on production server so i am not able take debug logs/tcp dumps. `08/08/2019 05:45:05.456 +0530 | 1565223305456 [webflux-http-nio-2] WARN reactor.netty.http.client.HttpClientConnect -[id: 0xb2c5cd59, L:/:53826 - R::443] The connection observed an error [javax.net.ssl.SSLException: SSLEngine closed already at io.netty.handler.ssl.SslHandler.wrap(...)(Unknown Source) 08/08/2019 05:21:33.896 +0530 | 1565221893896 [webflux-http-nio-2] WARN reactor.netty.http.client.HttpClientConnect - [id: 0xb2c5cd59, L:/:53826 - R::443] The connection observed an error [javax.net.ssl.SSLException: SSLEngine closed already at io.netty.handler.ssl.SslHandler.wrap(...)(Unknown Source) 08/08/2019 04:47:38.915 +0530 | 1565219858915 [webflux-http-nio-2] WARN reactor.netty.http.client.HttpClientConnect -[id: 0xb2c5cd59, L:/:53826 - R::443] The connection observed an error [javax.net.ssl.SSLException: SSLEngine closed already at io.netty.handler.ssl.SslHandler.wrap(...)(Unknown Source)` |
For now, I have applied a retry for this exception and it is helping. At least since 07/25 (when I restarted my process) my app has not failed with this exception. .retryWhen(Retry.anyOf(MyCustomException.class, ReadTimeoutException.class, WriteTimeoutException.class, SSLException.class)
.retryMax(maxRetryAttempt)
.backoff(Backoff.exponential(Duration.ofMillis(1000L), Duration.ofSeconds(10), 3, true))
.jitter(Jitter.random(0.9))
.doOnRetry(retryContext -> logger.error("[Service Invocation failed][Retry context: {}]", retryContext))
) What I feel is reactor-netty is not refreshing the connection pool once such exceptions happen. Till now, I found |
@iamankur82 Can you provide some reproducible example/describe the scenario.
@iamankur82 @nnanda2016 Can you switch from |
@violetagg I will give it a try; only thing is, with retry, it has not happened again in my app. I can give it a try in one of our lower env, but I donno when it will break. I will report in this issue if I see the app breaking with SSLException |
Hi , |
@violetagg We have tried with |
@venkatnpedada @sms0070 try to isolate the scenario and provide a reproducible example. Also if you can try 0.8.11.BUILD-SNAPSHOT, we have some fixes there that might be relevant. |
With some load testing, I am seeing the same issue on 0.9.0.M3. Here is the stack trace:
|
@kushagraThapar Are you able to extract a reproducible example. Can you test the current 0.9.0.BUILD-SNAPSHOT as we have fixes that might be related? |
@violetagg thanks so much for all your help on this so far. I have been trying to create a minimal repo to reproduce this issue and have struggled. For us, we only see the issue under fairly heavy and sustained load. I will keep working to see if I can get something working. As you noted above, it appears that a channel is entering a state where the channel itself is open and active, but the I'm trying to build a toy example wherein a channel enters this state "naturally". Assuming I'm ever able to figure this out how to do this, it seems that the inevitable conclusion is that using Will continue to work on the example, but I was curious about your thoughts on that analysis. |
We are facing the similar issue. Reactor Netty 0.8.9.RELEASE, Spring Framework 5.1.8.RELEASE, Spring Boot 2.1.6.RELEASE |
We also have the same issue, we tried switching to fixed pool without success, we think of testing 0.9.0.RC1 if that could make sense |
Debug Loggers and tcpdump should help to understand the actual cause this error message and in turn can help in finding resolution. |
When is a pull request scheduled to be created of out this commit? I'm working on a project where we are very eager to have a fix for |
any update about this issue ? I can see one branch with commit try to fix it but not released yet since 11-7-2019. Or should we have a try for |
Same issue here, takes production servers down completely since they seem to have no way to recover. Similar behavior to the others that have commented:
Additional info:
private Mono<byte[]> downloadImage(String uri) {
return HttpClient.create()
.secure()
.get()
.uri(uri)
.responseSingle((httpClientResponse, byteBufMono) -> byteBufMono.asByteArray().subscribeOn(scheduler))
.publishOn(scheduler).subscribeOn(scheduler);
} |
I am having the same issue. I can reproduce the issue very easily. Here is what I am using:
My scenario is the following:
We I run tests with smaller POST requests I do not see the error, but with large request, they start to happen very quickly. This is how I am starting the POST requests:
And:
And this is the exception I am getting:
|
@CamielCop Is it possible to provide more complete example? |
@violetagg A double gateway test might help reproduce. I’ve definitely seen this one. Would try with POST of reasonable size. Maybe a regression? |
Reproduced with a val connector = ReactorClientHttpConnector(
from(create().runOn(LoopResources.create("reactor-webclient"))
.option(CONNECT_TIMEOUT_MILLIS, REQUEST_TIMEOUT.toInt())
.doOnConnected { connection ->
connection.addHandlerLast(
ReadTimeoutHandler(
REQUEST_TIMEOUT,
TimeUnit.MILLISECONDS
)
)
connection.addHandlerLast(
WriteTimeoutHandler(
REQUEST_TIMEOUT,
TimeUnit.MILLISECONDS
)
)
}
)
.followRedirect(true)
) The stacktrace looks the same :
|
All If you are able to test this PR #1065 it will be great Thanks |
The cases that may lead to this exception are:
|
The fix will be available in 0.9.7.RELEASE |
when will 0.9.7 be released? |
@martinritz 27.04 |
@violetagg - Unfortunately, I am still seeing this issue with latest reactor-netty release 0.9.7.RELEASE. It happens randomly but not necessarily on load testing. We have different benchmarks for our SDK (Azure Cosmos DB) including (load read testing, load write testing, load read my writes testing, etc). This issue happens only in some cases. Anyway, I plan to debug this more in details, but was wondering if you can provide any pointers or log messages I should add somewhere to be able to debug this quickly. Thanks! FYI, this is our reactor netty http client class : https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java Here is the complete stack trace :
|
@kushagraThapar Here the exception is a bit different the inbound is not closed but canceled.
Do you use operators as |
@violetagg - Yes indeed, we do use these operators at a lot of places in our code ( What different can we do to avoid these two issues ( |
So I tried getting rid of some of these operators .. but I still seeing the issue, I am wondering if there is a way to get to know which operator in the chain exactly called the cancel operation on the http client / or underlying channel ? |
@violetagg - sure will do, thanks. |
@violetagg - I did some digging, and to start with, the problem is occurring because of these two operators we have: But these operators don't mention explicitly how they cancel the subscription or dispose elements that are not required. That being said, I am still not sure how the cancellation happens. My logic is related to some sort of pagination using queries, and I have something like below for pagination logic:
Do you think the problem could be related to this pagination logic, which in turn might be causing any issue with the above two operators ? But I am thinking what will happen in the case where an end user cancels the operation and there are still remaining elements in the buffer (or to process) - Wouldn't the end user see this exception in that case? |
@simonbasle @bsideup Can you tell us how cancelation happens in case of |
for
for
|
@violetagg So based on these operators which cancel the upstream operations, I want to know what can be done in the reactor netty client to handle these cancellations ? Since this issue will be faced by all end users who cancel the operation manually and there are still remaining elements in the buffer (or to process) - Wouldn't the end user see this exception in that case? Because in cases like |
@violetagg @simonbasle - I think the issue might be related to But I saw that |
There is no real alternative to cancelling a subscription when you're only interested in a subset of the elements. Even if you could request the exact desired amount (which is not the case with |
Thanks @simonbasle for the explanation, @violetagg - is there a way to handle this scenario in reactor-netty http / tcp client? |
@kushagraThapar Create a new issue with a reproducible example in order to see why a cancellation causes OOM. Let's discuss there also the SSLException when you cancel (close the connection) and what's your expectation for this use case. |
@violetagg - I have opened new issue with repro code : #1165 |
Cloud Gateway version:2.1.3 RELEASE
Reactor Netty version: 0.8.5 RELEASE
In our production system, we are observing "javax.net.ssl.SSLException: SSLEngine closed already" exceptions. The exception encounter rate depending on the route count and traffic.
I'm using the elastic pool(default) and below is the stack trace from logs:
reactor-http-nio-150 HttpClientConnect - [id: 0x21e26b17, L:/127.0.0.1:29974 - R:localhost/127.0.0.1:8872] The connection observed an error
javax.net.ssl.SSLException: SSLEngine closed already
at io.netty.handler.ssl.SslHandler.wrap(...)(Unknown Source)
I am suspecting the connection is already closed on other end, but elastic connection pool is still trying to use the same connection which is causing this issue. seems netty connection pool not doing health checks whether the connection is already broken or not.
The text was updated successfully, but these errors were encountered: