Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTPS connections to proxies. #1679

Merged
merged 9 commits into from Mar 12, 2020

Conversation

jalopezsilva
Copy link
Contributor

Currently there's no way to validate the identify of the proxy you are connecting to. Proxies supporting HTTPS endpoints are becoming more common and extending support for them is necessary.

This commit adds the ability to establish an HTTPS connection to a proxy.

When an HTTPS proxy is provided, instead of doing an HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination, similar to how HTTP connections through a proxy are handled.

Another implementation alternative would have been to do TLS to the proxy, later an HTTP CONNECT and within the tunnel, another TLS connection to the destination. This is more complex and requires us to use the SSLContext.wrap_bio methods (i.e TLS within TLS). I opted for this simpler approach as a start but I'm happy to extend if needed.

@codecov-io
Copy link

codecov-io commented Oct 3, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@1084b2a). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1679   +/-   ##
=========================================
  Coverage          ?   99.65%           
=========================================
  Files             ?       22           
  Lines             ?     2015           
  Branches          ?        0           
=========================================
  Hits              ?     2008           
  Misses            ?        7           
  Partials          ?        0
Impacted Files Coverage Δ
src/urllib3/connection.py 97.98% <100%> (ø)
src/urllib3/connectionpool.py 100% <100%> (ø)
src/urllib3/poolmanager.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1084b2a...b813e47. Read the comment docs.

@jalopezsilva jalopezsilva force-pushed the https_proxy_support branch 3 times, most recently from 4c8996f to ffed605 Compare October 3, 2019 19:55
@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Oct 12, 2019

Any feedback? Happy to adjust as needed.

Facebook needs to support HTTPS proxies internally and we would love to get this pull request through!

@gregprosser
Copy link

What's blocking this getting merged into the codebase? I have several downstream projects that are using urllib3 that would benefit from having this support?

@YogeshMhatre
Copy link

+1 - I have a few critical projects waiting on this feature. It would be incredibly useful if this is merged

Copy link

@aausch aausch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change set does not support proxy_headers for the https proxy

src/urllib3/connectionpool.py Show resolved Hide resolved
@pquentin
Copy link
Member

pquentin commented Nov 4, 2019

@jalopezsilva @aausch Thank you for the discussion above, this was very useful to me as I'm not familiar with the way proxies work. I'm glad that you two now agree.

@jalopezsilva I'm now trying to understand the bigger picture here. I've spent most of my day trying to understand what's going on. As far as I can tell, the way urllib3 currently supports proxies is:

  1. for HTTP proxies and HTTP resources, we send requests with the absolute URL cf. RFC 7230, section 5.4
  2. for HTTP/HTTPS proxies and HTTPS resources, we send a HTTP CONNECT in plaintext, upgrade the connection to the original server to TLS, and send requests as usual

(You've said above that you would like to add CONNECT support for fetching HTTPS resources from an HTTPS proxy, but I think that already exists.)

Anyway, now we're adding a third mode: for HTTPS proxies we connect to the proxy using TLS from the start, then we request the absolute URL just like in the HTTP proxy mode above.

Compared to the existing HTTP CONNECT support, we never talk in plain HTTP to the proxy, so an attacker listening our connection to the proxy does not know what origin server we are requesting. However, compared to HTTP CONNECT we need to trust the proxy, because it's now able to see all our communication to the origin server, which was not possible before.

The best of both worlds would be obtained with TLS-in-TLS, but that's harder to implement so we're leaving this for future work.

Is this correct?

@jalopezsilva @gregprosser @YogeshMhatre Can you share more about your use case? Is this about fetching HTTP resources from HTTPS proxies? Understanding what kind of resources you're accessing through what kind of proxy would be helpful to understand where you're coming from.

src/urllib3/connection.py Outdated Show resolved Hide resolved
@aausch
Copy link

aausch commented Nov 4, 2019

(You've said above that you would like to add CONNECT support for fetching HTTPS resources from an HTTPS proxy, but I think that already exists.)

Pretty sure this doesn't currently work correctly

@pquentin
Copy link
Member

pquentin commented Nov 4, 2019

(You've said above that you would like to add CONNECT support for fetching HTTPS resources from an HTTPS proxy, but I think that already exists.)

Pretty sure this doesn't currently work correctly

Yeah, I think urllib3 just ignores the fact that the proxy supports HTTPS and just treats is as a HTTP proxy. But it still "works". Doing things correctly would require TLS in TLS.

@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Nov 5, 2019

@pquentin Thanks for your initial review. I'm adding more context here.

Our use case is fetching HTTP / HTTPS resources through an HTTPS proxy. We run a forward proxy in our production environment through which employee and micro-services traffic goes through. We want to ensure this traffic is encrypted in all cases and we also want to easily identify clients through mutual authentication. (i.e clients will present a certificate to the proxy identifying themselves. Traffic not properly authenticated will be rejected)

Right now urllib3 ignores the proxy scheme. If you provide a proxy with an HTTPS scheme https://proxy, the scheme will be silently ignored and urllib3 will attempt to contact the proxy through plaintext. HTTPS proxies are becoming more common (libcurl now supports them) and as key library in the python ecosystem, urllib3 should also support them.

This PR adds support for HTTPS proxies in two ways:

  • HTTP resources through an HTTPS proxy by "bridging" or forwarding.
  • HTTPS resources through an HTTPS proxy by "bridging" or forwarding. This is very insecure though, and after our conversation above with Alex, I'm throwing an exception if anyone attempts to use this unless a proper flag has been given.

Proper fetching of HTTPS resources through an HTTPS proxy requires TLS to the proxy, HTTP CONNECT, and TLS to the destination. This is the behavior libcurl provides. I have ideas of how to add this (discussion above about SSLContext.wrap_bio methods) but that should go in a separate PR I believe.

Hopefully this all makes sense, let me know if you have any further questions or comments. (Also, last lint failure seems to be unrelated to my changes?)

@pquentin pquentin mentioned this pull request Nov 5, 2019
@pquentin
Copy link
Member

pquentin commented Nov 5, 2019

@jalopezsilva Thank you for the context, that makes a lot of sense. The client authentication argument is particularly convincing! It's an elegant way to trace the requests of both employees and other micro-services, I might steal that idea at $DAYJOB one day. :)

I opened #1733 to fix the lint failure, sorry about that.

Here's what needs to happen from my point of view before we can merge this PR:

  • Improve our documentation. The "Proxies" section of the "Advanced usage" page is currently way too low on details: we need to document how to use proxies with 1. explicit plaintext HTTP 2. HTTP CONNECT + TLS to the end resource and 3. explicit HTTPS
  • Rename allow_https_proxy_to_see_traffic to _allow_https_proxy_to_see_traffic and mention that it could be removed at any time -- I want to be able to drop this option when we get TLS within TLS support
  • I need to test this locally, but I don't know yet how to configure mitmproxy or Charles Proxy (or anything else that runs on macOS) to just bridge/forward explicit HTTP/HTTPS requests while still connecting to the proxy using TLS
  • Discuss with @urllib3/maintainers to see how to handle this breaking change

@urllib3/maintainers Okay, so why is this a breaking change?

urllib3 only uses proxies through HTTP right now. We have two modes:

  • When requesting an HTTP resource, we ask the proxy to make the request on our behalf.
  • When requesting an HTTPS resource, we use HTTP CONNECT and set up a TLS connection to the requested resource, using the proxy as a dumb pipe.

While it's currently possible to specify an HTTPS proxy for both HTTP and HTTPS traffic, we still connect to the proxy via HTTP, not HTTPS. After this pull request, for HTTPS proxies we'll:

  1. connect to the proxy with HTTPS, but there's no guarantee that the HTTPS connection will succeed even if the HTTP connection succeeded.
  2. refuse to proceed for HTTPS requests without the escape hatch, but that means we could start refusing requests that were going through until now. (Another option here is to continue to request HTTPS traffic through HTTP, even if it's bad for security.)

However, those are absolutely the right things to do, and even if most users are using HTTP proxies, we might have users in the wild who think they're connecting to their proxy using TLS while they're not, or users who want things to work this way.

How do we want to handle this?

@aausch
Copy link

aausch commented Nov 5, 2019

I need to test this locally, but I don't know yet how to configure mitmproxy or Charles Proxy (or anything else that runs on macOS) to just bridge/forward explicit HTTP/HTTPS requests while still connecting to the proxy using TLS

You can use something like nginx to terminate the https connection, and have the proxy then just do http -> http behind it

@pquentin
Copy link
Member

@jalopezsilva Thank you for the new commit!

I just realized there's a good way around the compatibility issue: we should provide a new HTTPSProxyManager for HTTPS proxies, and keep ProxyManager for HTTP proxies. And possibly add a warning when ProxyManager is used with a https proxy URL. What do you think?

@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Nov 11, 2019

Having a HTTPSProxyManager is a good idea, although I wonder if it will make adoption of HTTPS proxies more difficult for libraries using urllib3. Most of the libraries I've seen (requests, botocore) use the proxy_from_url method to get the corresponding ProxyManager. Are you thinking of also updating proxy_from_url to return a HTTPSProxyManager when provided with an HTTPS proxy URL? That would make it easier for libraries to start using HTTPS proxies. We could probably add a flag (enable_https_proxies) on proxy_from_url to enable or disable HTTPS proxies. The flag would be disabled initially but perhaps down the road we can make it enabled by default.

Another option would be to add that enable_https_proxies flag directly on ProxyManager and use that as a gating mechanism. Disabled initially, enabled by default in the future hopefully!

Either way works for me! Let me know your thoughts @pquentin and I'll make the needed changes. Also let me know if you need help setting up a forward proxy to test this out, I can try to set up a docker image or something along those lines.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to make this code work using nginx, thanks @aausch for the suggestion.

I don't have a strong opinion about the gating mechanism. It could be in proxy_from_url if we opt for a new class, or in ProxyManager if it's easier to add the new behavior there.

It will force clients like requests and botocore to update their code to see how they want to expose this to their users, but I don't think forcing this onto them is a good idea. But then I'd still like to have the opinion of the urllib3 maintainers here as I'm new here. :)

docs/advanced-usage.rst Outdated Show resolved Hide resolved
CONTRIBUTORS.txt Outdated Show resolved Hide resolved
@sethmlarson
Copy link
Member

I'm just popping in to say:

  • A breaking change that makes HTTPS proxy connections actually use HTTPS instead of silently using HTTP is a welcome change. :) People shouldn't not know that their connections aren't happening securely.
  • I'll give this PR a review soon, busy time in my life. Thanks to @jalopezsilva and all participating 🎉

@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Dec 4, 2019

Sorry for the delayed update, busy times in Novemeber! I have updated the PR with two additional commits:

  • Commit # 1 : Addresses the documentation and CONTRIBUTORS.txt comments that @pquentin had.

  • Commit # 2 : The aim of this commit is to minimize some of the concerns that we have with introducing a breaking change. It's possible that clients right now have proxy configurations with an HTTPS proxy scheme which are being contacted through HTTP. If we merge my changes as is, urllib3 will stop working for such clients. To minimize the impact, I'm disabling the HTTPS proxy behavior unless a flag is provided (_enable_https_proxies). I'm printing a warning when I detect there's an invalid configuration and I'm keeping the old behavior. In future releases, we can remove this flag and have HTTPS proxies enabled by default.

Hope the second commit addresses the concerns with the breaking change! We can keep commit # 2 or discard it, whatever works.

Looking forwarding to merging this soon! Thanks!

@jalopezsilva
Copy link
Contributor Author

Happy New Year @pquentin, @sethmlarson! Any chance we could review this? I believe my last commit addresses the remaining concerns with breaking changes.

Would love to have this merged soon!

@pquentin
Copy link
Member

pquentin commented Jan 7, 2020

Thanks for the friendly tone @jalopezsilva, I'm sure the situation is very frustrating to you, and I would like to apologize for delay here. The pull request looks good to me and the general idea has been approved by @sethmlarson -- I'm now going to coordinate with him to understand what the next steps are.

Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I have some comments.

src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
@@ -383,6 +408,17 @@ class ProxyManager(PoolManager):
HTTPS/CONNECT case they are sent only once. Could be used for proxy
authentication.

:param _enable_https_proxies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of adding flags, especially if this is a new feature. Can we have this feature be additive and then do what we're already doing in this PR and error out in an invalid scenario (HTTPS+HTTPS)?

Logical next-steps are to figure out if we can also support HTTPS+HTTPS before we release this so there's less confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll error out then. I preferred to set a warning as I was concerned that multiple clients with an incorrect configuration (i.e HTTPS + HTTPS) would be affected. But then again, it's an incorrect configuration so they might as well fix it now.

That way, when we add the HTTPS + HTTPS support clients should have their configuration up to date.

I'll make the changes and update here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great, thank you for this work! It'll make many more users connections secure. :)

@jalopezsilva
Copy link
Contributor Author

@sethmlarson Whenever you have a chance, give this another go. Should be good to go!

I'll create another PR for the HTTPS / HTTPS support as we had agreed.

Currently there's no way to validate identify for the proxy you might be
connecting. Proxies supporting HTTPS endpoints are becoming more common
and we need to extend the support for them.

When an HTTPS proxy is provided, instead of doing the HTTP CONNECT,
we'll forward any requests directly to the proxy and ultimately to the
destination.
- Will be supported by default when we can do TLS within TLS.
* Renamed flag for HTTPS websites through HTTPS proxies.
* Added myself to contributors.
* Removed mention that TLS in TLS is being developed as requested.
* Space in between my name and the github page.
Now that we're adding support for HTTPS proxies we want to avoid
a breaking change with clients that had an improper proxy configuration.

For now, we're adding a warning an defaulting to the previous behavior.
In the future we'll change the behavior to enable HTTPS proxies by
default.
As requested in the last revision for the PR:

- Removed the _enable_https_proxies flag. Instead the feature will be
  enabled and will error out on invalid configurations. (HTTPS + HTTPS)
- Other comments: rename a method, parentheses to clarify order of
  operations.
@jalopezsilva
Copy link
Contributor Author

Hey @sethmlarson, I'm starting to add the TLS within TLS support over in #1806. It would be super useful if we can merge this in so I can continue work on that PR. I think we're good to go here, let me know if we need to make any further changes.

@jalopezsilva
Copy link
Contributor Author

Hey @sethmlarson @pquentin, another ping here as I haven't heard from you guys in a while. Are we still planning on merging these changes?

Facebook has been using them internally for the past 4 months without any issues. I would prefer to merge them as I want to avoid having to fork FB's internal version of urllib3. That said, I'm starting to get concerned on how long this is taking and whether we'll be able to merge them or not.

Do let me know so I know how to proceed forward. Thanks!

"Are you sure you want to use HTTPS to contact the proxy? "
"This most likely indicates an error in your configuration."
"If you are sure you want use HTTPS to contact the proxy, enable "
"the _allow_https_proxy_to_see_traffic.",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If you are sure you want use HTTPS to contact the proxy, enable "
"that by using the parameter: _allow_https_proxy_to_see_traffic.",

@pquentin pquentin dismissed sethmlarson’s stale review March 12, 2020 08:38

The requested changes have been made.

@pquentin
Copy link
Member

Merging, even with the typo, we can always fix it later, eg. by removing "the". Thanks @jalopezsilva for your work and your patience. Tested again, it works well, and the code looks good to me.

@pquentin pquentin merged commit 8c7a43b into urllib3:master Mar 12, 2020
@sethmlarson
Copy link
Member

Sorry for fumbling the review of this PR hard, I had tested the patch locally and it seemed to work. Thanks for tackling this. :)

@jalopezsilva
Copy link
Contributor Author

Thanks for the prompt response guys! I'll fix the typo and follow up in #1806.

jalopezsilva added a commit to jalopezsilva/urllib3 that referenced this pull request Aug 12, 2020
* Add support to talk HTTPS to proxies.

Currently there's no way to validate identify for the proxy you might be
connecting. Proxies supporting HTTPS endpoints are becoming more common
and we need to extend the support for them.

When an HTTPS proxy is provided, instead of doing the HTTP CONNECT,
we'll forward any requests directly to the proxy and ultimately to the
destination.

* Fix proxy_headers missing on HTTPS proxy connections.

* blackfmt missing files.

* Prevent usage of HTTPS proxies when fetching HTTPS resources.

- Will be supported by default when we can do TLS within TLS.

* Update proxy documentation with more information.

* Renamed flag for HTTPS websites through HTTPS proxies.
* Added myself to contributors.

* Documentation and contributors fixes.

* Removed mention that TLS in TLS is being developed as requested.
* Space in between my name and the github page.

* Add flag to enable HTTPS proxy support.

Now that we're adding support for HTTPS proxies we want to avoid
a breaking change with clients that had an improper proxy configuration.

For now, we're adding a warning an defaulting to the previous behavior.
In the future we'll change the behavior to enable HTTPS proxies by
default.

* Remove guard flag, error out on HTTPS/HTTPS.

As requested in the last revision for the PR:

- Removed the _enable_https_proxies flag. Instead the feature will be
  enabled and will error out on invalid configurations. (HTTPS + HTTPS)
- Other comments: rename a method, parentheses to clarify order of
  operations.
jalopezsilva added a commit to jalopezsilva/urllib3 that referenced this pull request Oct 20, 2020
* Add support to talk HTTPS to proxies.

Currently there's no way to validate identify for the proxy you might be
connecting. Proxies supporting HTTPS endpoints are becoming more common
and we need to extend the support for them.

When an HTTPS proxy is provided, instead of doing the HTTP CONNECT,
we'll forward any requests directly to the proxy and ultimately to the
destination.

* Fix proxy_headers missing on HTTPS proxy connections.

* blackfmt missing files.

* Prevent usage of HTTPS proxies when fetching HTTPS resources.

- Will be supported by default when we can do TLS within TLS.

* Update proxy documentation with more information.

* Renamed flag for HTTPS websites through HTTPS proxies.
* Added myself to contributors.

* Documentation and contributors fixes.

* Removed mention that TLS in TLS is being developed as requested.
* Space in between my name and the github page.

* Add flag to enable HTTPS proxy support.

Now that we're adding support for HTTPS proxies we want to avoid
a breaking change with clients that had an improper proxy configuration.

For now, we're adding a warning an defaulting to the previous behavior.
In the future we'll change the behavior to enable HTTPS proxies by
default.

* Remove guard flag, error out on HTTPS/HTTPS.

As requested in the last revision for the PR:

- Removed the _enable_https_proxies flag. Instead the feature will be
  enabled and will error out on invalid configurations. (HTTPS + HTTPS)
- Other comments: rename a method, parentheses to clarify order of
  operations.
Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this pull request Mar 16, 2022
* Add support to talk HTTPS to proxies.

Currently there's no way to validate identify for the proxy you might be
connecting. Proxies supporting HTTPS endpoints are becoming more common
and we need to extend the support for them.

When an HTTPS proxy is provided, instead of doing the HTTP CONNECT,
we'll forward any requests directly to the proxy and ultimately to the
destination.

* Fix proxy_headers missing on HTTPS proxy connections.

* blackfmt missing files.

* Prevent usage of HTTPS proxies when fetching HTTPS resources.

- Will be supported by default when we can do TLS within TLS.

* Update proxy documentation with more information.

* Renamed flag for HTTPS websites through HTTPS proxies.
* Added myself to contributors.

* Documentation and contributors fixes.

* Removed mention that TLS in TLS is being developed as requested.
* Space in between my name and the github page.

* Add flag to enable HTTPS proxy support.

Now that we're adding support for HTTPS proxies we want to avoid
a breaking change with clients that had an improper proxy configuration.

For now, we're adding a warning an defaulting to the previous behavior.
In the future we'll change the behavior to enable HTTPS proxies by
default.

* Remove guard flag, error out on HTTPS/HTTPS.

As requested in the last revision for the PR:

- Removed the _enable_https_proxies flag. Instead the feature will be
  enabled and will error out on invalid configurations. (HTTPS + HTTPS)
- Other comments: rename a method, parentheses to clarify order of
  operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants