-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLSv1.3 can fail with HTTP/2 and Session Tickets Enabled #10041
Comments
@carl-mastrangelo thanks a lot of the detailed description... I will look into it. |
@carl-mastrangelo is this netty-tcnative-boringssl-static ? |
@normanmaurer yes. I haven't tried with the dynamic artifacts. |
@carl-mastrangelo ok cool... Thanks for verifying... @davidben we use |
Ah yeah, we should probably be incorporating any of the pending handshake bytes into that. I didn't get to looking at this today, but I'll poke at it tomorrow. (Mostly I need to check whether anything else uses Note this will mean that |
(This probably changed for you because we used to send NewSessionTicket during the server handshake, but now we defer it to the first |
@davidben yes this would be fine :) please ping me once you have a fix and I will see how we can adopt it. Unfortunately we tracking the chromium-stable branch so it may take some time before we can use it :( |
@davidben It looks like Put another way, It seems like the code is design to handle this case, but isn't. |
Netty does a number of odd things with assumptions about write overheads in order to implement the SSLEngine API, which is probably where the dropping of data comes from. (Really the API Netty wants isn't SSL_write or BIOs in the first place but alas we've yet to have the time to build the BIO-less API.) |
(We quite extensibly test things assuming a single-byte write buffer so it would be quite surprising if data was getting dropped in BoringSSL.) |
If that's the case, this issue is 2 bugs:
Fixing the BoringSSL issue will mask the first one. |
cc: @kyagna |
I know this my question is out the frame but I couldn't find any answer for it. How to enable Session Tickets? |
@davidben is there a "maximum" number of bytes per SessionTicket that we can reserve here ? |
I think reserving a maximum number of bytes here is unwise. The problem that has to be addressed is that Netty is assuming a behaviour from BoringSSL that doesn't match what it actually does. Netty seems to be assuming that I think Netty has to work to break the assumption it's making here: it's not reflective of what BoringSSL actually does. |
@carl-mastrangelo ok good news is that I can reproduce it... Now the fun begins. Will keep you posted . |
@davidben @Lukasa @carl-mastrangelo ok update here... it is a netty bug and I have a fix ready, just working on a unit test now. So no surprise here, BoringSSL works as expected :) |
…renceCountedOpenSslEngine.wrap(...) Motivation: We did not correctly account for produced bytes when SSL_write(...) returns -1 in all cases. This could lead to lost data and so a corrupt SSL connection. Modifications: Always ensure we calculate the produced bytes correctly Result: Fixes #10041
Fixed by #10063 ... |
@davidben we fixed a bug in netty that solves the problem for us but I would still be interested to hear if you have any plans to adjust |
@normanmaurer thanks! |
netty#10063) Motivation: We did not correctly account for produced bytes when SSL_write(...) returns -1 in all cases. This could lead to lost data and so a corrupt SSL connection. Modifications: - Always ensure we calculate the produced bytes correctly - Add unit tests Result: Fixes netty#10041
Netty version
4.1.45 + TCNative 2.0.28
JVM version (e.g.
java -version
)Java 1.8
OS version (e.g.
uname -a
)Linux / Mac
Repro
The steps to reproduce this are fairly difficult, and I don't know enough of the OpenSSL API, but I can give the manual steps I took to get here. Netflix is trying to enable TLSv1.3 on some of its servers, but it results in some corrupted SSL connections. This appears most commonly during connection startup, but I think it can happen at any point.
Typical errors look like
SSL_ERROR_RX_RECORD_TOO_LONG
in Firefox, orERR_SSL_PROTOCOL_ERROR
in Chrome, but this is in fact due to data corruption in Netty. The core issue is a bad interaction in the two overloads ofSslHandler.wrap
:netty/handler/src/main/java/io/netty/handler/ssl/SslHandler.java
Lines 1003 to 1058 in 136db86
At the bottom, on line 1043, If the
engine.wrap
call results in aBUFFER_OVERFLOW
, it resize the buffer to try again. The bug here is that when this happens, the original data that was attempted to be written gets lost. In my case, the first 68 or so bytes discarded, leaving a partial response to be written out. This is the first half of the bug.The second part is the setup to this bug, and is the other wrap overload:
https://github.com/netty/netty/blob/netty-4.1.45.Final/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L801-L882
On line 821, the call to
allocateOutNetBuf
attempts to create a buffer large enough to hold the write. In my case, the readable data is 46 bytes. This results in a call toReferenceCountedOpenSslEngine.calculateMaxLengthForWrap()
, which adds 22 bytes of extra headroom, resulting in a 68 byte buffer. This is normally the correct amount if session tickets are not enabled. When they are, several hundred additional bytes are needed. These session ticket bytes are included by the OpenSSL library, and don't appear to be account for by Netty.This series of events is most common when turning on TLSv1.3, HTTP/2, and Session Tickets. In my experimentation, the SSL handshake actually succeeds, but then crashes shortly after. The sequence of events looks like:
openssl s_client -tls1_3 -connect 127.0.0.1:7006 -alpn h2 -debug -msg
SslHandler.setHandshakeSuccess
is invoked, and fires the event up the pipeline.ApplicationProtocolNegotiationHandler
captures the handshake event, seesh2
has been picked, and installsHttp2FrameCodec
, and invokeshandlerAdded
.wrap()
function as mentioned above, trying to wrap the 46 bytes of application data.17 03 03
, which correctly indicates this is application data.engine.wrap
to return-1
. When attempted again, the session tickets are written out, but they are missing the 5 byte header that identifies them.I was able to force this to succeed by manually growing the
out
buffer in thewrap
call to a very large size. This allows the initialengine.wrap
to succeed, send the session ticket through, following by the application data. When this happens, openssl prints outPost-Handshake New Session Ticket arrived
.@normanmaurer I'm really not sure how to fix this, I have packet captures of most of this, with the failure case, the success case, and the non-session ticket case. I don't know enough of the OpenSSL API to make a call on what should happen.
[1]:
[2]
The text was updated successfully, but these errors were encountered: