Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TLS session resumption in HTTPSConnection #1886

Closed
wants to merge 4 commits into from

Conversation

PleasantMachine9
Copy link
Contributor

Related to issue #590

If using the ssl module from the reference Python3 binding, the SSLSocket.session attribute references a TLS session which may have a TLS session ticket/ID.

See: https://docs.python.org/3.8/library/ssl.html#ssl-session (New in version 3.6.)

If the python version is below 3.6, then getattr(sock, 'session', None) should return None (in HTTPSConnection), in which case we will not invoke context.wrap_socket with the extra new kwargs session.

Unfortunately the tests do not seem to work locally in my WSL env, and IPv6 does not work either.

Verified locally like this (for manual testing purposes):

import time
import logging
from urllib3.connectionpool import connection_from_url

logging.basicConfig(level=logging.DEBUG)
conn = connection_from_url('https://video-weaver.vie01.hls.ttvnw.net/')
# 1st time, new TCP connection & session ticket
conn.request('GET', '/')
"""
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): video-weaver.vie01.hls.ttvnw.net:443
DEBUG:urllib3.connection:For <urllib3.connection.HTTPSConnection object at 0x0000022D46CA3E20>: session_reused=False, new ssl_session=<_ssl.Session object at 0x0000022D46CA3FD0>
"""

# eventually server sends FIN,ACK in the background, even with `keep-alive`
time.sleep(10)
# 2nd time, new TCP connection, but sending session ticket, resulting in session_reused=True
conn.request('GET', '/')
"""
DEBUG:urllib3.connectionpool:Resetting dropped connection: video-weaver.vie01.hls.ttvnw.net
DEBUG:urllib3.connection:For <urllib3.connection.HTTPSConnection object at 0x0000022D46CA3E20>: session_reused=True, new ssl_session=<_ssl.Session object at 0x0000022D46C2DAC0>
"""

Compared with Wireshark outputs:
image

I am unsure how to add test cases for session resumption, as the test libraries are very daunting (not to mention, an SSL server that sends TLS session tickets is needed). Any help/tips would be appreciated on that front.

@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #1886 into master will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master     #1886   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           25        25           
  Lines         2290      2305   +15     
=========================================
+ Hits          2290      2305   +15     
Impacted Files Coverage Δ
src/urllib3/connectionpool.py 100.00% <ø> (ø)
src/urllib3/util/ssltransport.py 100.00% <ø> (ø)
src/urllib3/connection.py 100.00% <100.00%> (ø)
src/urllib3/util/ssl_.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1f05ae...a1fd6bc. Read the comment docs.

@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Jun 6, 2020

Looks like the Tornado app server used in tests does not handle TLS session tickets correctly, not sure how to fix that.

It is not accepting the session ticket in connnection2 that it sends in connection1. It's not only the unittests failing, but also Firefox and Chrome behave the same with this app server.

The session resumption fails because the server does not accept the ticket, but the code seems to be working as intended (the ticket is sent in connection2)

@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Jun 7, 2020

The codecov error is confusing.

  • src/urllib3/contrib/_appengine_environ.py
    ^ I don't see at all how my changes affect this code
  • src/urllib3/response.py
    ^ some code section in a Brotli class is not called, I also don't understand how this is related
  • src/urllib3/util/wait.py
    ^ I understand this the least. not sure how the continue line is skipped when the ones right above it run

@pquentin
Copy link
Member

pquentin commented Jun 8, 2020

@PleasantMachine9 The issue with coverage is that Travis CI did not run at all.

Closing/reopening, hopefully that should fix the coverage issue.

@pquentin pquentin closed this Jun 8, 2020
@pquentin pquentin reopened this Jun 8, 2020
@pquentin
Copy link
Member

pquentin commented Jun 8, 2020

Oh, Travis detected "abuse" and is refusing all our builds.

Screenshot 2020-06-08 at 11 03 31

I've contacted their support, I'll tell you when this is fixed.

@pquentin
Copy link
Member

pquentin commented Jun 8, 2020

Travis tells me they fixed the issue. That was quick! Retrying.

@pquentin pquentin closed this Jun 8, 2020
@pquentin pquentin reopened this Jun 8, 2020
@PleasantMachine9
Copy link
Contributor Author

Is there someone in particular I should ask for review or should I just wait?

@pquentin
Copy link
Member

pquentin commented Jun 8, 2020

Now that the tests pass, I'll try to take a first look at this, hopefully tomorrow. The overall approach sounds good, and indeed it's unlikely that we'll be able to test this automatically since this feature isn't supported by Tornado.

Thanks for this pull request, by the way!

@PleasantMachine9
Copy link
Contributor Author

One possibility could be using some external HTTPS server that does support & correctly handle session tickets, but that's also very fragile for a test, and unit tests needing external https access is probably a no-go.
Unless travis/appveyor or one of the other tools has some accessible interna; https endpoint to test against?

@sethmlarson
Copy link
Member

Thanks for starting this discussion, TLS resumption would be a fantastic addition to urllib3.

FYI there are multiple instances where sessions should not be resumed for security reasons (eg: broken or indefinite response framing, differing auth) that should be investigated for this review. Should read relevant RFCs re HTTP+TLS and sessions and dump findings here.

@PleasantMachine9
Copy link
Contributor Author

At least with the python/SSL library versions I've tried, if the session resumption fails, it's transparent (except for SSLSocket.session_reused attribute) to the caller, and ssl will just redo the full handshake.

If the server does not support or accept the session ticket (due to expiry or other internal reasons it may have), it will continue with the full TLS handshake as if it never saw the session ticket being sent. Incidentally the Tornado test server is a good case study for this in the unit tests, as it completely ignores the sent TLS session ticket on the 2nd connection.

The documentation in python3's ssl library is quite lacking on this _ssl.Session object, but I assume that's because it's just a shallow python interface to some underlying C struct from openssl - we can't even access the actual session-ticket bytestring. I will research a bit more on how this interfaces with openssl at least.

@PleasantMachine9
Copy link
Contributor Author

Looks like there is also a way in the SSLContext to specify that we do not want a session ticket to be issued. By default (unless the users provide their own SSLContext) it's on, which means we will ask for a session ticket:

ssl.OP_NO_TICKET
    Prevent client side from requesting a session ticket.
    New in version 3.6.

I will add that on as an improvement that we avoid asking for tickets if we don't have session resumption enabled and are using the default SSLContext.

@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Jun 8, 2020

Other than what I mentioned I don't have a deep security background, and a lot of concerns you brought up are actually more for openssl than for the python code here. If such a thurough security check is needed on this then each openssl version would need to be inspected for its behavior.

Aside from the RFC related to session tickets: https://tools.ietf.org/html/rfc5077

This has some more details on possible drawbacks https://blog.filippo.io/we-need-to-talk-about-session-tickets/

The following seem to be the major concerns:

If the server has a bad impl. of session ticketing, and does not encrypt properly or does not rotate its keys properly, then Forward Secrecy is compromised

Not sure if the client code should implement workarounds for badly implemented servers here. I imagine this would need to be done on a case-by-case basis, or making the feature opt-in instead of opt-out. We fundamentally don't know if a session ticket is "good" because it's encrypted so it's opaque to us.

If TLS 1.2 or earlier is used, the session ticket is sent in the clear, before Change Cipher Spec, so passive eavesdropping is easy

Minor risk to have it in the clear in case the server implements key rotation properly. According to the RFC the session ticket should use strong encryption and the server should rotate its ticketing keys frequently.

o The keys and cryptographic protection algorithms should be at
least 128 bits in strength. Some ciphersuites and applications
may require cryptographic protection greater than 128 bits in
strength.

o The keys should not be used for any purpose other than generating
and verifying tickets.

o The keys should be changed regularly.

Being able to tell that the same session is being resumed (ie. easy fingerprinting of TLS channels).

I think this is not that big of a concern, as anyway the SNI and source+dest IP:port are already very effective at fingerprinting, along with things like the list & order of supported ciphers/curves.

Firefox, Edge & Chrome all have session ticketing enabled by default on TLSv1.2, so as additional reasoning, others also think that the benefit of fast resumption is worth the risks.

If security is the highest priority here then a compromise would be making it so it's only enabled by default if resuming a TLSv1.3 connection, however the vast majority of servers are not on 1.3 so that would disqualify a lot of use cases.

@pquentin
Copy link
Member

pquentin commented Jun 9, 2020

Thanks for looking into this! https://blog.filippo.io/we-need-to-talk-about-session-tickets/ was an interesting read.

I believe we should follow the lead of curl: curl/curl#3202. That is, allow session resumption only via session ids with TLS <= 1.2, and allow session resumption only via session tickets with TLS 1.3.

@sethmlarson Thoughts?

@PleasantMachine9
Copy link
Contributor Author

allow session resumption only via session ids with TLS <= 1.2, and allow session resumption only via session tickets with TLS 1.3

I'm okay with that suggestion as long as it will be something that can at least be explicitly enabled for TLSv1.2 (as that was my original use case).

A few notes about such an approach though:

  • We actually only know the real TLS version after the connection has been established (via sock.version()), this will further complicate the logic. Changing TLS versions between connections is probably handled by openssl though, and it's a corner case I suppose.

  • Would need a more complex flag/indicator to tell the "policy" to HTTPSConnection, where the default would be like only use tickets on 1.3; ids on 1.2 and below then also the option use tickets on all versions should be something that can be enabled. Or it could be an enum of sorts with the following options, to simplify:

1:  Always try to resume the TLS session if the ssl library in use supports it.
       -> always unset OP_NO_TICKET in ctx.options (if using self-generated SSLCtx)

2:  TLSv1.3>= : resume sessions with tickets
    TLSv1.2<= : resume sessions with ids only (check  .has_ticket == False)
       -> not sure whether to set/unset OP_NO_TICKET, see below

3:  Don't ever try to resume a TLS session
       -> always set OP_NO_TICKET in ctx.options (if using self-generated SSLCtx)

If this approach with 3 options is preferred, then some more thinking is needed on Option nr. 2.
We have a sort of chicken and egg situation, because we would want to know what TLS version will be used before starting the handshake, so that we can set/unset OP_NO_TICKET.

There are 2 possibilities I can think of right now is (if opt2 is selected policy, and default_ssl_context==True):

  1. (optimistic) On very first .connect() call of the instance, unset OP_NO_TICKET (so we request a ticket).
    (pessimistic) On very first .connect() call of the instance, set OP_NO_TICKET (so we do not request a ticket).
  2. After this first connection is established, save its TLS version into an instance-level attribute.
  3. If the session has a ticket, and we are not on TLSv1.3, then don't save the session obj.
  4. In future connections, use the stored attribute to determine whether to set OP_NO_TICKET.

Do you like this enum-based solution to give options to library users?
Also, do you have a suggestion on how to expose these "policy" options? I'm not sure the current solution is good where it's an almost-hidden kwarg for HTTPSConnection/HTTPSConnectionPool, but I'm not sure how to make it more apparent.

@PleasantMachine9
Copy link
Contributor Author

I remembered another interesting note: it seemed from the tests that OP_NO_TICKET has no effect on TLSv1.3, as I noticed that even when OP_NO_TICKET was set, session.has_ticket was true.

If this is working as intended in TLSv1.3 (meaning the client cannot hint it does not want a ticket), then actually that simplifies the logic I detailed above a lot, because OP_NO_TICKET would be irrelevant anyway.

@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Jun 9, 2020

I couldn't figure out where in openssl this flag is used in the openssl codebase, I'm not great a C.

At least from the TLSv1.3 RFC, it seems that the extension which was used on v1.2 to indicate whether the client wanted tickets is not mentioned, so I suppose that OP_NO_TICKET is pointless on v1.3.

Actually from the RFC, it seems that session tickets and IDs are no longer distinguished in v1.3, and it is up to the server completely whether the ticket contains crypto data, or is just a database-id.

In that case, the suggested logic for Opt2 is:

  • Always set OP_NO_TICKET, it is ignored on v1.3, and disables tickets on v1.2 and earlier
  • Always save sock.session, on v1.3 it will have tickets, on v1.2 it should not have any because OP_NO_TICKET was set.

@PleasantMachine9
Copy link
Contributor Author

I uploaded a new commit with the skeleton of the suggestions I made above.

Tests are not done for it yet, please let me know if you think this is a good approach overall, then I will continue with it in more depth.

@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Jun 14, 2020

Maybe the continuous-integration/travis-ci hook is stuck again.

I realized also there is one limitation of my current approach to note: this only works on a per-HTTPSConnection object level. Within a HTTPSConnectionPool, session resumption will not be supported accross separate HTTPSConnection objects. That would need a quite bit more complex solution (some guards to make sure that a common SSLSession is altered/accessed only in a well-defined manner, not to mention one-time-use tickets exist on TLSv1.3).

At least for the use-case I had in mind when making this, only supporting it on the HTTPSConnection level is okay though, but this will mean that each HTTPSConnection within a pool will maintain its own session information, and they won't share one.

@PleasantMachine9
Copy link
Contributor Author

@pquentin could you please check the travis hook again? I hope I'm not DoS'ing them.

@pquentin
Copy link
Member

pquentin commented Jun 18, 2020

Don't worry, you're not DDoSing them! I'm not seeing failed requests here like before. Closing/reopening to see if that helps... (edit: it did, here is your run: https://travis-ci.org/github/urllib3/urllib3/builds/699571876)

Anyway, we need to figure out what the best tradeoff here is between security, usability and maintainability. (I'm sorry that you wrote so much code already, but those questions need to be answered first.) It looks like adding policies for this feature is a lot of added complexity. I think I'm leaning towards allowing this on TLS 1.2 instead, if that's important for your use case.

@sethmlarson @sigmavirus24 Thoughts?

@pquentin pquentin closed this Jun 18, 2020
@pquentin pquentin reopened this Jun 18, 2020
@PleasantMachine9
Copy link
Contributor Author

I can add another draft without this "policy" behavior (where the policy is basically the "ALWAYS" option).

@pquentin
Copy link
Member

@sethmlarson Thoughts?

@sethmlarson
Copy link
Member

sethmlarson commented Jun 29, 2020

@PleasantMachine9 Before we start writing code can we discuss within an issue what TLS session resumption by default in urllib3 would look like (where do tickets live, under which conditions do we attempt to resume sessions, what exact performance gains are we achieving by adding this support) and once we nail that down we can start a PR that accomplishes that.

@PleasantMachine9
Copy link
Contributor Author

@sethmlarson I've opened one a little while ago: #1898
#1898

Indicate with a simple flag kwarg whether this should be enabled.
To test it in the wild, CPython 3.6+ and an SSL ticket/id issuing
server is needed.

This cannot be tested with current libraries in the repo, as
tornado (which is the mock https server) does not seem to
work properly with SSL tickets/ids.

Below is a way to test it in the real world:

  import logging
  from urllib3.connectionpool import connection_from_url
  logging.basicConfig(level=logging.DEBUG)
  # example sites that support SSL resumption, at time of writing:
  urls = [
      'https://www.reddit.com/',
      'https://www.twitch.tv/',
      'https://twitter.com/',
  ]
  for url in urls:
      conn = connection_from_url(url)
      # by default, connection is kept alive,
      # for testing this, it's easier if it isn't:
      conn.request('GET', '/', headers={'Connection':'close'})
      conn.request('GET', '/', headers={'Connection':'close'})
      # should see:
      #   DEBUG:urllib3.connection:For <...>: session_reused=True
@PleasantMachine9
Copy link
Contributor Author

PleasantMachine9 commented Sep 17, 2020

Also since it was previously asked what my rationale is for this change, I can write a short summary about that.

I'm using the https://github.com/streamlink/streamlink library. This library supports HLS streams, which are basically just a loop of HTTP GET@/url/video_fragment.mp4 requests, every few seconds.

What I have observed is that the edge servers for some video streaming sites do not always respect Connection: keep-alive and just send FIN packets regardless (as seen in original comment screenshot).

This causes urllib3 (used by requests, which is used by streamlink) to reopen the HTTPSConnection socket, and redo the whole handshake. This did lead to some micro-stutters in the video playback, and is also just plain inefficient (doing a TLS handshake every 2-4s is bad, for example for laptop batteries it may or may not be a noticeable difference).

When I compared this with a browser client (Firefox) I observed a similar behavior from the edge server (sending TCP FIN), but, as expected, browsers do resend the TLS 1.2 session ticket, unlike the pre-pull request revision of urllib3, resulting in 50-100ms faster reconnection times (and probably less crypto computation, see #1898 )

@pquentin
Copy link
Member

pquentin commented Oct 1, 2020

I haven't followed your discussions closely, but is this obsolete now that #1970 was merged?

@PleasantMachine9
Copy link
Contributor Author

I mean, the two are separate things. If urllib is not using the tickets it makes no sense to request them.

This commit would enable their use, so it's still viable. #1898 has the rest of the discussion.
I commented on that issue now to summarize the findings there.

Overall I don't see why the PR should be rejected. There are just some unclear parts about what should or should not be done in scope of the change. From my perspective, trying to track/check if SSL sockets are shut down gracefully seems unnecessary, as I don't see any practical vulnerabilities that not doing so could introduce.

The commit itself is actually working well for me, I have been using its locally applied version for a few months at this point.

Explicitly passing the kwarg and it being None is not the same
as passing a kwarg dict where it's not added when it is None.
@@ -116,6 +118,7 @@ def _const_compare_digest_backport(a, b):
try:
from ssl import SSLContext # Modern SSL?
except ImportError:
IS_SSL_SESSION_SUPPORTED = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should delete the other commit which disables session tickets.

The two don't conflict, this commit overwrites it in practice but it's not elegant.

@PleasantMachine9
Copy link
Contributor Author

I'll close this PR for now to stop clutering the board.
I probably won't have time for a while to push on this one.

For now, applying it as a local patch will do fine for me at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants