Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Connection Timeout Intermittently #4444

Closed
Amarendraar23 opened this issue Dec 24, 2019 · 16 comments
Closed

TLS Connection Timeout Intermittently #4444

Amarendraar23 opened this issue Dec 24, 2019 · 16 comments
Labels
Bug For general bugs on Jetty side

Comments

@Amarendraar23
Copy link

Amarendraar23 commented Dec 24, 2019

Jetty

IBM JDK 1.8.5.40

Red Hat Enterprise Linux Server release 7.7

I upgraded from 9.4.11 to 9.4.25 in product. I am noticing connection getting stuck & timing out without loading the webpage. I deployed a helloworld war on the jetty server. The curl command seems stuck during/after TLS handshake. Once the timeout is hit the connection is closed down.

image

Attaching the Jetty & SSL logs

Connection-TimeOutSSl.zip

I have tried to use 9.4.12,9.4.20 and 9.4.24 versions, but without success.

This seems to happen intermittently but with larger number of request failing.

@sbordet
Copy link
Contributor

sbordet commented Dec 24, 2019

The logs report this:

ALPN, server did not agree to a protocol

This means that you have a problem in the configuration of your server.
Either you have not configured HTTP/2, or you have not configured ALPN properly.

@Amarendraar23
Copy link
Author

Amarendraar23 commented Dec 24, 2019

I am using HTTP1.1 in the sslcontextfactory. So do i need to configure HTTP/2 or ALPN ?

The same log appears in working case as well. So not sure thats the issue.

image

@sbordet
Copy link
Contributor

sbordet commented Dec 24, 2019

@Amarendraar23 sorry I wrongly assumed you wanted to have the server support HTTP/2.

Can you enable DEBUG logging for org.eclipse.jetty? What you posted only contains the SSL DEBUG logs.

@Amarendraar23
Copy link
Author

there is a jetty.log file as well. Anyways attaching it here.
jetty.log.D20191224.zip

@sreedharj
Copy link

@sbordet any update on this issue? I did some analysis. it looks like data is not getting flushed
even through produced data is available.

19:14:02.755:DBUG:oejis.SslConnection:qtp_httpsamar_HttpServerAdapter_node1-4246: flush NEED_WRAP
[2019-12-24 19:14:02.755] ALL 000000000000 GLOBAL_SCOPE 2019-12-24 19:14:02.755:DBUG:oejis.SslConnection:qtp_httpsamar_HttpServerAdapter_node1-4246: wrap Status = OK HandshakeStatus = FINISHED bytesConsumed = 0 bytesProduced = 101 [p=0,l=101,c=17408,r=101] ioDone=false/false
[2019-12-24 19:14:02.755] ALL 000000000000 GLOBAL_SCOPE 2019-12-24 19:14:02.755:DBUG:oejis.SslConnection:qtp_httpsamar_HttpServerAdapter_node1-4246: net flushed=false, ac=true
[2019-12-24 19:14:02.755] ALL 000000000000 GLOBAL_SCOPE 2019-12-24 19:14:02.755:DBUG:oejis.SslConnection:qtp_httpsamar_HttpServerAdapter_node1-4246: handshake succeeded SslConnection@19d06602::SocketChannelEndPoint@f90e03d2{/10.15.109.12:49479<->/10.2.116.100:22222,OPEN,fill=-,flush=-,to=0/30000}{io=0/0,kio=0,kro=0}->SslConnection@19d06602{NOT_HANDSHAKING,eio=0/101,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@c74616a3{/10.15.109.12:49479<->/10.2.116.100:22222,OPEN,fill=-,flush=-,to=323/30000}=>HttpConnection@d81a6f0a[p=HttpParser{s=START,0 of -1},g=HttpGenerator@3614740a{s=START}]=>HttpChannelOverHttp@2f6cbdaa{s=HttpChannelState@ebb2241f{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0} resumed server TLSv1.2/SSL_ECDHE_RSA_WITH_AES_256_CBC_SHA384
[2019-12-24 19:14:02.755] ALL 000000000000 GLOBAL_SCOPE 2019-12-24

@sbordet
Copy link
Contributor

sbordet commented Dec 25, 2019

@Amarendraar23 the logs only contain DEBUG information for SslConnection.
We need DEBUG logs for org.eclipse.jetty.

What I can see from the logs is that the TLS handshake terminates successfully, but then the server does not receive any more bytes and idle times out, which is expected behavior.
Are you sure the logs are related to your curl request?

@Amarendraar23
Copy link
Author

Amarendraar23 commented Dec 26, 2019

@sbordet

I have shared new logs with org.eclipse.jetty DEBUG enabled and ssl logs attached in the same zip.

jetty_logs-26-12-19.zip

I have 3 curl requests in the logs shared. First 2 curl requests have completed successfully with the page getting retrieved. But the third request didn't load and timed out.
Please a take a look at the timestamp(16:36:08.677 - jetty.log.D20191226.T163417) where handshake finished and produced 101 bytes, but the flush method says flushed 0. Is my understanding correct?

Jetty Logs
image

SSL Logs
image

Let me know if you need any more logs or info.

@sbordet
Copy link
Contributor

sbordet commented Dec 26, 2019

The logs show that we are towards the end of the TLS handshake, in fill(), and we are told to wrap; we call flush() generate 6 bytes, flush them ok, then wrap again, generate 101 bytes, but this time we cannot flush them (TCP congested), and return false from flush(). At this point, the handshake status is NOT_HANDSHAKING because the SSLEngine has generated the bytes, although they could not be flushed.

We then return 0 from fill(), and we end up in needsFillInterest() where we have some logic to write() the bytes left in the encrypted output buffer, but only when the handshake status is NEED_WRAP, while in this case is NOT_HANDSHAKING.
So we never write() the 101 bytes, and while we are fill interested, the client is waiting for the 101 bytes to complete the handshake and won't send more bytes.

I've never seen this, so probably it's a different handling of TLS in the IBM JVM that Jetty does not handle well.

@sbordet sbordet added Bug For general bugs on Jetty side and removed More Info Required labels Dec 26, 2019
sbordet added a commit that referenced this issue Dec 26, 2019
Fixed handling of encrypted bytes to write in
needsFillInterest() when the TLS handshake status
is NOT_HANDSHAKING.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet
Copy link
Contributor

sbordet commented Dec 26, 2019

@Amarendraar23 are you able to try the fixed code in PR #4446 and report back if it works for you?

To build the fixed code:

$ git clone https://github.com/eclipse/jetty.project.git
$ cd jetty.project
$ git checkout jetty-9.4.x-4444-tls_idle_timeout
$ mvn clean install -DskipTests

Then use 9.4.25-SNAPSHOT as a dependency (we have not updated the POMs to 9.4.26-SNAPSHOT yet). Thanks!

@Amarendraar23
Copy link
Author

@sbordet I built the projects and did some testing around it. All requests are completing successfully. Ran the curl command in a loop and sleep of 10 secs for 5 mins.
I have asked my QA team to do some more regression.
Do i need to cover any specific edge cases for this? I can do that and along with my QA results report back.

@sbordet
Copy link
Contributor

sbordet commented Dec 27, 2019

@Amarendraar23 if you could confirm with your QA that the fix solves the issue for you, will be great.

The fix introduced a few test failures in our test suite, so we need to investigate why and try to find a fix that works for both Oracle and IBM JVMs - we will ask you to do more testing when we have figured out the issue. Thanks!

@Amarendraar23
Copy link
Author

@sbordet We performed several tests and didn't see any failure. So i can confirm that the fix works fine in the scenarios we could test.

What could be the possible ETA for this fix, as its a high priority for us to upgrade?

sbordet added a commit that referenced this issue Dec 30, 2019
Fixed write in doShutdownOutput() by updating the state before the write,
so that needsFillInterest() can check whether to also do a write or not.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@Amarendraar23
Copy link
Author

Seeing the below TimeoutException which is being ignored.

[2020-01-02 13:44:31.927] ALL 000000000000 GLOBAL_SCOPE 2020-01-02 13:44:31.927:DBUG:oeji.WriteFlusher:Connector-Scheduler-2ed207f3-1: ignored: WriteFlusher@fdca9b1f{IDLE}->null
java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113)
        at org.eclipse.jetty.io.IdleTimeout$$Lambda$54.00000000500080C0.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
        at java.util.concurrent.FutureTask.run(FutureTask.java:277)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:191)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.lang.Thread.run(Thread.java:785)

Is this expected?

jetty.log.D20200102.T133455.zip

@sbordet
Copy link
Contributor

sbordet commented Jan 3, 2020

@Amarendraar23 the logs are incomplete (we need to see the log lines before and after to tell what's going on), but in general idle timeouts are normal, as they close idle connections to save server resources.

Idle timeouts may happen during the processing of requests, and if so they are ignored, as per the Servlet Specification.

@brucem-ping
Copy link

I believe I may also be seeing this issue to this in Jetty 9.4.18 with Java 1.8.0_221-b11 on RHEL. When a message exceeds the maximum fragment length of 16384 bytes the second part of the fragment is not fully sent until the connection times out.

It seems as though Jetty only fills up the last available fragment, and never continues on to the next message.

Edited log snippet:

DEBUG  [org.eclipse.jetty.io.ssl.SslConnection] wrap Status = OK HandshakeStatus = NEED_UNWRAP bytesConsumed = 0 bytesProduced = 427 [p=0,l=427,c=17408,r=427] ioDone=false/false
DEBUG  [org.eclipse.jetty.io.ChannelEndPoint] flushed 27 SocketChannelEndPoint@8b431ea{/x.x.x.x:xxxx<->/x.x.x.x:xxxx,OPEN,fill=-,flush=-,to=3/30000}{io=0/0,kio=0,kro=1}->SslConnection@16d99da2{NEED_UNWRAP,eio=0/400,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@3fcbc0d0{/x.x.x.x:xxxx<->/x.x.x.x:xxxx,OPEN,fill=-,flush=-,to=92/30000}=>HttpConnection@3561cbf7[p=HttpParser{s=START,0 of -1},g=HttpGenerator@1fb75fdc{s=START}]=>HttpChannelOverHttp@24b201ae{r=0,c=false,c=false/false,a=IDLE,uri=null,age=0}
DEBUG  [org.eclipse.jetty.io.ChannelEndPoint] Buffer util is not empty: java.nio.HeapByteBuffer[pos=27 lim=427 cap=17408]
DEBUG  [org.eclipse.jetty.io.ssl.SslConnection] net flushed=false, ac=true
DEBUG  [org.eclipse.jetty.io.ssl.SslConnection] SSL connection wrap result: OK
DEBUG  [org.eclipse.jetty.io.ssl.SslConnection] <flush false SslConnection@16d99da2::SocketChannelEndPoint@8b431ea{/x.x.x.x:xxxx<->/x.x.x.x:xxxx,OPEN,fill=-,flush=-,to=1/30000}{io=0/0,kio=0,kro=1}->SslConnection@16d99da2{NEED_UNWRAP,eio=0/400,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@3fcbc0d0{/x.x.x.x:xxxx<->/x.x.x.x:xxxx,OPEN,fill=-,flush=-,to=93/30000}=>HttpConnection@3561cbf7[p=HttpParser{s=START,0 of -1},g=HttpGenerator@1fb75fdc{s=START}]=>HttpChannelOverHttp@24b201ae{r=0,c=false,c=false/false,a=IDLE,uri=null,age=0}
DEBUG  [org.eclipse.jetty.io.ssl.SslConnection] Encrypted input consumed, closing buffer pool

This appears to happen as there are still 27 bytes available of the 16384 in the current message so they are filled, but the next part of the message never gets sent.

Packet trace:
Server Hello, Certificate, Server Key Exchange:

[Frame: 503, payload: 0-2735 (2736 bytes)]
[Frame: 504, payload: 2736-5471 (2736 bytes)]
[Frame: 505, payload: 5472-8207 (2736 bytes)]
[Frame: 506, payload: 8208-10943 (2736 bytes)]
[Frame: 507, payload: 10944-13679 (2736 bytes)]
[Frame: 509, payload: 13680-16388 (2709 bytes)]

Certificate Request, Server Hello Done:

[Frame: 509, payload: 0-26 (27 bytes)]
[Frame: 643, payload: 27-742 (716 bytes)] <== doesn't get sent until the connection is closed

If this does not seem to be the same problem I can open a new ticket with more details.

@Amarendraar23
Copy link
Author

Amarendraar23 commented Jan 16, 2020

Hi @sbordet Any update on the fix? Seems like its failing in tests again

@sbordet sbordet added this to To do in Jetty 9.4.26 via automation Jan 17, 2020
sbordet added a commit that referenced this issue Jan 17, 2020
Fixes #4444 -  Connection timeout intermittently when using jetty 9.4…
sbordet added a commit that referenced this issue Jan 17, 2020
Fixed write in doShutdownOutput() by updating the state before the write,
so that needsFillInterest() can check whether to also do a write or not.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Jetty 9.4.26 automation moved this from To do to Done Jan 17, 2020
@joakime joakime changed the title Connection timeout intermittently when using jetty 9.4.25 TLS Connection Timeout Intermittently Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
No open projects
Jetty 9.4.26
  
Done
Development

No branches or pull requests

4 participants