javax.net.ssl.SSLException: handshake timed out #907

gurudatta11 · 2019-11-21T17:05:38Z

System Architecture

We as client can communicate to different devices (servers) with different ip addresses.
All the servers share a common root certificate to expose it as TLS, but unique key/ keystore per device.

Expected Behavior

No Handshake time out should occur.

When making a call to few devices concurrently (let say less than some threshold 4) error will not occur, but if we invoke calls concurrently more than this threshold , getting to see handshake timeout issues.

Debugged with option -Djavax.net.debug=ssl , still no luck on figuring out why the issue is happening.

There is no problem with the server, when we try like one on one with a server, we never encountered handshake timeout issue, but when tried on multiple servers concurrently few error out and few get success in handshake.

I think there is some concurrency issues going on in reactor netty, unable to figure out where.
Please share architecture diagram if there is any for reactory netty .

Any pointers would be helpful to resolve this issue.

Actual Behavior

Getting Below error:

2019-11-20 13:07:22,108 361283 [reactor-http-epoll-8] WARN  r.n.http.client.HttpClientConnect - [id: 0xab3f418a, L:/<ip1>:40554 - R:<ip2>/<ip2>:5000] The connection observed an
 error
javax.net.ssl.SSLException: handshake timed out
        at io.netty.handler.ssl.SslHandler$5.run(SslHandler.java:2011)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:150)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413)
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
reactor-http-epoll-8, called closeOutbound()
reactor-http-epoll-8, closeOutboundInternal()
reactor-http-epoll-8, SEND TLSv1.2 ALERT:  warning, description = close_notify
reactor-http-epoll-8, WRITE: TLSv1.2 Alert, length = 26
reactor-http-epoll-8, called closeInbound()
reactor-http-epoll-8, fatal error: 80: Inbound closed before receiving peer's close_notify: possible truncation attack?
javax.net.ssl.SSLException: Inbound closed before receiving peer's close_notify: possible truncation attack?
%% Invalidated:  [Session-1010, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256]
reactor-http-epoll-8, SEND TLSv1.2 ALERT:  fatal, description = internal_error
reactor-http-epoll-8, Exception sending alert: java.io.IOException: writer side was already closed.
2019-11-20 13:07:22,110 361285 [org.springframework.kafka.KafkaListenerEndpointContainer#1-1-C-1] ERROR com.bmg.service.HttpService - javax.net.ssl.SSLException: handshake timed out, {}
reactor.core.Exceptions$ReactiveException: javax.net.ssl.SSLException: handshake timed out
        at reactor.core.Exceptions.propagate(Exceptions.java:326)
        at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
        at reactor.core.publisher.Mono.block(Mono.java:1494)

client make a call to server
gets HandshakeTimeout after around 1.15 min to 1.40 min

Since default HandshakeTimeout is 10 secs, also tried setting below system variable property , but still handshake timeout occurs.
-Dreactor.netty.tcp.sshHandshakeTimeout=120000
HandshakeTimeout didn't occur after 2 minutes are per above config, it occurred around same range 1.15 min to 1.40 min,

Even though error says handshakeTimeout, it feels like this call is being internally queued and tried after certain time and then handshakeTimeout occurs.

Steps to Reproduce

JdkSsl context being used by reactor netty.
Getting HttpClient as below:
Note: getting a newConnection (instead of HttpClient.create() ) else there is weird concurrency problem going on, instead of hitting one server it's hitting different server and also used to get reactor.netty.http.client.PrematureCloseException (Reference: https://projectreactor.io/docs/netty/release/reference/index.html#_connect) hence using newConnection.

public HttpClient getHttpClient(SslContext sslContext, int connectTimeOutInMilliSeconds,
    int readTimeOutInMilliSeconds) {

  HttpClient httpClient = HttpClient.newConnection().tcpConfiguration(tcpClient ->
      tcpClient
          .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, connectTimeOutInMilliSeconds)
          .doOnConnected(connection -> connection
              .addHandlerLast(new ReadTimeoutHandler(readTimeOutInMilliSeconds,
                  TimeUnit.MILLISECONDS))
              .addHandlerLast((new WriteTimeoutHandler(readTimeOutInMilliSeconds,
                  TimeUnit.MILLISECONDS)))));
  if (sslContext != null) {
    httpClient = httpClient.secure(sslContextSpec -> sslContextSpec.sslContext(sslContext));
  }
  return httpClient;
}

Getting sslContext as below:

  private SslContext getTrustAllSslWebClient() {
    try {
      return SslContextBuilder
          .forClient()
          .trustManager(InsecureTrustManagerFactory.INSTANCE)
          .build();
    } catch (SSLException e) {
//ignore
    }
  }

Minimal yet complete reproducer code (or URL to code)

This is difficult to reproduce without complete production setup.

Possible Solution

Your Environment

Reactor version(s) used:
reactor-netty :0.8.13:RELEASE
Other relevant libraries versions (eg. netty, ...):
netty -> 4.1.43.FINAL
derived based on spring boot parent version given below

 <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.10.RELEASE</version>
    <relativePath/> <!-- lookup parent from repository -->
  </parent>

JVM version (e.g. `java -version`)

openjdk version "1.8.0_222"
Also tried on java11.

OS version (e.g. `uname -a`)

4.15.0-66-generic #75-Ubuntu

The text was updated successfully, but these errors were encountered:

gurudatta11 · 2019-11-21T17:06:11Z

is it related to worker count ? some how the error as handshake time out is misleading.
-Dreactor.netty.ioWorkerCount=16

Created issue at netty project, got deferred to here, fyi, netty/netty#9792

violetagg · 2019-11-21T22:42:20Z

@gurudatta11 Is it possible that you open too many connections?

Look here #796 (comment)

Try to switch to a fixed connection pool and tell us whether you see the issue with this configuration.

gurudatta11 · 2019-11-21T23:33:24Z

SSLException with handshake time out is gone,
with either of the below fixes

Setting worker count to a higher number
reactor.netty.ioWorkerCount=128
By making a fixed connection pool with max connections a higher number.

gurudatta11 · 2019-11-21T23:34:50Z

Weird thing is I don't think it's not handshake time out exception at all, it's connection pool or worker count problem. The way the error is propagated is wrong.
@violetagg

violetagg · 2019-11-22T05:36:44Z

@gurudatta11 Did you check the link to the issue above? Note that with version 0.9 the connection pool is fixed by default, but with version 0.8 it is elastic by default.

gurudatta11 · 2019-11-22T20:00:13Z

@violetagg
sorry, I have not seen the link you provided.
After checking out, #796 (comment)
yes, that will fix the issue if the number of connections are less than the given fixed limit (500) or user defined.

But not sure, if that's the actual fix, if connections are being queued internally in the thread pool after the max limit, what's the expected behavior ??
If below two points are not valid, this issue can be closed.

Is that supposed to timeout if the connection pool is full, instead of being queued to get a connection from pool ?
Even if times out, why is error being thrown as different error (handshake time out).
(may be log as warn what the actual error is , or output actual error)

Also logging as warn, if connection pool reached maximum limit might be helpful.

For my current setup:
I check the number of processors are 4,
so workers are being given as 4 , as per code, and default acquireTimeout is 45 seconds reactor netty
https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/LoopResources.java#L47

https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/ConnectionProvider.java#L47

Since in my prod code, I used new connection,
https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/NewConnectionProvider.java#L49

Also, Is it possible to update existing documentation with high level architecture diagram for reactor-netty.

violetagg · 2019-11-27T06:31:14Z

@violetagg
sorry, I have not seen the link you provided.
After checking out, #796 (comment)
yes, that will fix the issue if the number of connections are less than the given fixed limit (500) or user defined.

But not sure, if that's the actual fix, if connections are being queued internally in the thread pool after the max limit, what's the expected behavior ??

If you use the fixed ConnectionProvider you cannot go beyond the max connections, which is not the case when you use elastic. elastic is by default for 0.8.x, while fixed is by default for 0.9.x

If below two points are not valid, this issue can be closed.

Is that supposed to timeout if the connection pool is full, instead of being queued to get a connection from pool ?

If you use the fixed ConnectionProvider there is acquire timeout, which is not the case when you use elastic.

Even if times out, why is error being thrown as different error (handshake time out).
(may be log as warn what the actual error is , or output actual error)

Once a connection is acquired/created, TLS handshake starts. If there are too many connections then it might be that 10s (the default TSL handshake timeout) is not enough.

Also logging as warn, if connection pool reached maximum limit might be helpful.

Please create an enhancement issue for this

For my current setup:
I check the number of processors are 4,
so workers are being given as 4 , as per code, and default acquireTimeout is 45 seconds reactor netty
https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/LoopResources.java#L47

https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/ConnectionProvider.java#L47

Since in my prod code, I used new connection,
https://github.com/reactor/reactor-netty/blob/master/src/main/java/reactor/netty/resources/NewConnectionProvider.java#L49

This provider does not use pooling. Which means we will create a connection every time and then this connection will be thrown after finishing. There is no acquire timeout.

Also, Is it possible to update existing documentation with high level architecture diagram for reactor-netty.

Please create an enhancement issue for this

gurudatta11 · 2019-11-27T18:47:36Z

Feature requests:
#916
and
#917
are created

@violetagg
As mentioned before, I don't see any issue with SSL timeout in general (it's not taking 10 secs for ssl handshake, after debugging the network and server).
in production without changing code, by setting environment variable, we observed zero issues with SSL handshake timeout.

Setting worker count to a higher number
reactor.netty.ioWorkerCount=128

But when the required connections are more than the configured max connection, it's reporting as ssl handshake timeout.

Nikhilkoneru · 2020-07-01T15:26:42Z

Even I had the same issue. Try this, it fixed my issue. The problem is not with connection timeout its with read timeout

@Bean
    public WebClient webClient(){
        HttpClient httpClient = HttpClient.create()
                .tcpConfiguration(tcpClient -> {
                    tcpClient = tcpClient.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 60000);
                    tcpClient = tcpClient.doOnConnected(conn -> conn
                            .addHandlerLast(new ReadTimeoutHandler(60000, TimeUnit.MILLISECONDS)));
                    return tcpClient;
                });
        ClientHttpConnector connector = new ReactorClientHttpConnector(httpClient);

        return WebClient.builder().baseUrl(endpoint).clientConnector(connector).build();
    }

uvarajk · 2021-04-29T08:00:24Z

@violetagg - We're receiving the SSL Handshake Timout Exception in our production environment.

Captured few details for the reference those are mentioned below :

In the Debug Log below are the default connection pool configurations
r.n.resources.PooledConnectionProvider : Creating a new [http] client pool [PoolFactory{evictionInterval=PT0S, leasingStrategy=fifo, maxConnections=500, maxIdleTime=-1, maxLifeTime=-1, metricsEnabled=false, pendingAcquireMaxCount=1000, pendingAcquireTimeout=45000}] for [/172.25.0.12:443]
We are able to see that the application is working fine for some time(4 to 5 hours) and slowly the responses are received late
Taken <<24442>> milliseconds to retrieve data from endpoint
Taken <<32875>> milliseconds to retrieve data from endpoint
Taken <<2532046>> milliseconds to retrieve data from endpoint
During the delayed responses following are the errors/exceptions
io.netty.handler.timeout.ReadTimeoutException: null
io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms
We have configured the following timeout's for the HTTPClient
web-client.connect-timeout=5
web-client.read-timeout=5
web-client.use-connection-pooling=true
reactor.netty.ioWorkerCount=128
web-client.response-timeout=5

It would be a great help if we could know what's going wrong.

violetagg · 2021-04-29T08:29:12Z

@uvarajk Please open a new issue, with Reactor Netty version, Java, OS, the scenario.

What component provides those properties?

web-client.connect-timeout=5
web-client.read-timeout=5
web-client.use-connection-pooling=true
reactor.netty.ioWorkerCount=128
web-client.response-timeout=5

You have response-timeout why do you specify read-timeout?

uvarajk · 2021-04-29T11:35:08Z

@violetagg opened a new issue @ #1617 (comment)

santitigaga · 2021-08-11T17:36:07Z

already has some solution for this issue?

arun-a-nayagam · 2021-08-13T14:31:09Z

Hi,
We are facing the same issue. Is the solution to set reactor.netty.ioWorkerCount to a higher value like 128?
Also, how to set this value (reactor.netty.ioWorkerCount=128)?
We are using Java SpringBoot Webflux code using webclient.

violetagg · 2021-08-14T16:11:33Z

@santitigaga @arun-a-nayagam Please open a new issue with a reproducible example.

violetagg · 2021-11-01T09:48:06Z

Closing this issue as there is no enough information in order to proceed with the investigation

DInTheName · 2022-07-07T12:25:39Z

Why not fix the problem?

forewei · 2022-12-04T14:57:02Z

I have the same code, different servers, some servers will have a handshake timeout, some will not

DInTheName · 2022-12-04T15:07:10Z

缓存下和配置中心的交互，不要一直去拿，那玩意有问题。 | | ***@***.*** | | ***@***.*** |

…

---- Replied Message ---- | From | ***@***.***> | | Date | 12/04/2022 22:57 | | To | ***@***.***> | | Cc | ***@***.***>***@***.***> | | Subject | Re: [reactor/reactor-netty] javax.net.ssl.SSLException: handshake timed out (#907) | I have the same code, different servers, some servers will have a handshake timeout, some will not — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

DInTheName · 2023-11-28T11:01:10Z

我们最后处理的是请求在处理时间不能太久，优化代码 At 2022-12-04 22:57:14, "fanwei" ***@***.***> wrote: I have the same code, different servers, some servers will have a handshake timeout, some will not — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

dyrone · 2024-03-21T15:53:46Z

upgrade jdk to 1.8.251

DInTheName · 2024-03-26T00:35:04Z

Our previous problem was when we shook hands, and did other things, to synchronize the registry, using CSE, and the CSE registry returned too slowly, and finally added an asynchronous cache here. Operational issues. Upgrading jdk should not solve it, because we are using jdk8.

…

---- 回复的原邮件 ---- | 发件人 | Teng Long ***@***.***> | | 日期 | 2024年03月21日 23:54 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***>***@***.***> | | 主题 | Re: [reactor/reactor-netty] javax.net.ssl.SSLException: handshake timed out (#907) | upgrade jdk to 1.8.251 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

gurudatta11 added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Nov 21, 2019

violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Nov 21, 2019

gurudatta11 mentioned this issue Nov 27, 2019

logging details for connection pool reactor-netty #917

Closed

violetagg added this to the 0.9.x Maintenance Backlog milestone Mar 20, 2020

uvarajk mentioned this issue Apr 29, 2021

io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms #1617

Closed

violetagg modified the milestones: 0.9.x Backlog, 1.0.x Backlog Jun 16, 2021

violetagg closed this as completed Nov 1, 2021

violetagg added status/cannot-reproduce We cannot reproduce this issue and removed type/bug A general bug for/user-attention This issue needs user attention (feedback, rework, etc...) labels Nov 1, 2021

violetagg removed this from the 1.0.x Backlog milestone Nov 1, 2021

joshfree mentioned this issue Apr 27, 2023

[BUG] io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms Azure/azure-sdk-for-java#34684

Closed

novoj mentioned this issue Sep 18, 2023

gRPC gets stuck in SSL handshake exception when running documentation tests from Linux developer systems FgForrest/evitaDB#257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

javax.net.ssl.SSLException: handshake timed out #907

javax.net.ssl.SSLException: handshake timed out #907

gurudatta11 commented Nov 21, 2019 •

edited

gurudatta11 commented Nov 21, 2019

violetagg commented Nov 21, 2019

gurudatta11 commented Nov 21, 2019

gurudatta11 commented Nov 21, 2019

violetagg commented Nov 22, 2019 •

edited

gurudatta11 commented Nov 22, 2019

violetagg commented Nov 27, 2019

gurudatta11 commented Nov 27, 2019

Nikhilkoneru commented Jul 1, 2020 •

edited

uvarajk commented Apr 29, 2021

violetagg commented Apr 29, 2021

uvarajk commented Apr 29, 2021 •

edited

santitigaga commented Aug 11, 2021

arun-a-nayagam commented Aug 13, 2021

violetagg commented Aug 14, 2021

violetagg commented Nov 1, 2021

DInTheName commented Jul 7, 2022

forewei commented Dec 4, 2022

DInTheName commented Dec 4, 2022 via email

DInTheName commented Nov 28, 2023 via email

dyrone commented Mar 21, 2024

DInTheName commented Mar 26, 2024 via email

javax.net.ssl.SSLException: handshake timed out #907

javax.net.ssl.SSLException: handshake timed out #907

Comments

gurudatta11 commented Nov 21, 2019 • edited

System Architecture

Expected Behavior

Actual Behavior

Steps to Reproduce

Minimal yet complete reproducer code (or URL to code)

Possible Solution

Your Environment

JVM version (e.g. java -version)

OS version (e.g. uname -a)

gurudatta11 commented Nov 21, 2019

violetagg commented Nov 21, 2019

gurudatta11 commented Nov 21, 2019

gurudatta11 commented Nov 21, 2019

violetagg commented Nov 22, 2019 • edited

gurudatta11 commented Nov 22, 2019

violetagg commented Nov 27, 2019

gurudatta11 commented Nov 27, 2019

Nikhilkoneru commented Jul 1, 2020 • edited

uvarajk commented Apr 29, 2021

violetagg commented Apr 29, 2021

uvarajk commented Apr 29, 2021 • edited

santitigaga commented Aug 11, 2021

arun-a-nayagam commented Aug 13, 2021

violetagg commented Aug 14, 2021

violetagg commented Nov 1, 2021

DInTheName commented Jul 7, 2022

forewei commented Dec 4, 2022

DInTheName commented Dec 4, 2022 via email

DInTheName commented Nov 28, 2023 via email

dyrone commented Mar 21, 2024

DInTheName commented Mar 26, 2024 via email

gurudatta11 commented Nov 21, 2019 •

edited

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)

violetagg commented Nov 22, 2019 •

edited

Nikhilkoneru commented Jul 1, 2020 •

edited

uvarajk commented Apr 29, 2021 •

edited