New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for HTTPS connections to proxies. #1679
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1679 +/- ##
=========================================
Coverage ? 99.65%
=========================================
Files ? 22
Lines ? 2015
Branches ? 0
=========================================
Hits ? 2008
Misses ? 7
Partials ? 0
Continue to review full report at Codecov.
|
4c8996f
to
ffed605
Compare
Any feedback? Happy to adjust as needed. Facebook needs to support HTTPS proxies internally and we would love to get this pull request through! |
What's blocking this getting merged into the codebase? I have several downstream projects that are using urllib3 that would benefit from having this support? |
+1 - I have a few critical projects waiting on this feature. It would be incredibly useful if this is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change set does not support proxy_headers for the https proxy
ffed605
to
fda3d89
Compare
@jalopezsilva @aausch Thank you for the discussion above, this was very useful to me as I'm not familiar with the way proxies work. I'm glad that you two now agree. @jalopezsilva I'm now trying to understand the bigger picture here. I've spent most of my day trying to understand what's going on. As far as I can tell, the way urllib3 currently supports proxies is:
(You've said above that you would like to add CONNECT support for fetching HTTPS resources from an HTTPS proxy, but I think that already exists.) Anyway, now we're adding a third mode: for HTTPS proxies we connect to the proxy using TLS from the start, then we request the absolute URL just like in the HTTP proxy mode above. Compared to the existing HTTP CONNECT support, we never talk in plain HTTP to the proxy, so an attacker listening our connection to the proxy does not know what origin server we are requesting. However, compared to HTTP CONNECT we need to trust the proxy, because it's now able to see all our communication to the origin server, which was not possible before. The best of both worlds would be obtained with TLS-in-TLS, but that's harder to implement so we're leaving this for future work. Is this correct? @jalopezsilva @gregprosser @YogeshMhatre Can you share more about your use case? Is this about fetching HTTP resources from HTTPS proxies? Understanding what kind of resources you're accessing through what kind of proxy would be helpful to understand where you're coming from. |
Pretty sure this doesn't currently work correctly |
Yeah, I think urllib3 just ignores the fact that the proxy supports HTTPS and just treats is as a HTTP proxy. But it still "works". Doing things correctly would require TLS in TLS. |
@pquentin Thanks for your initial review. I'm adding more context here. Our use case is fetching HTTP / HTTPS resources through an HTTPS proxy. We run a forward proxy in our production environment through which employee and micro-services traffic goes through. We want to ensure this traffic is encrypted in all cases and we also want to easily identify clients through mutual authentication. (i.e clients will present a certificate to the proxy identifying themselves. Traffic not properly authenticated will be rejected) Right now urllib3 ignores the proxy scheme. If you provide a proxy with an HTTPS scheme This PR adds support for HTTPS proxies in two ways:
Proper fetching of HTTPS resources through an HTTPS proxy requires TLS to the proxy, HTTP CONNECT, and TLS to the destination. This is the behavior libcurl provides. I have ideas of how to add this (discussion above about Hopefully this all makes sense, let me know if you have any further questions or comments. (Also, last lint failure seems to be unrelated to my changes?) |
@jalopezsilva Thank you for the context, that makes a lot of sense. The client authentication argument is particularly convincing! It's an elegant way to trace the requests of both employees and other micro-services, I might steal that idea at $DAYJOB one day. :) I opened #1733 to fix the lint failure, sorry about that. Here's what needs to happen from my point of view before we can merge this PR:
@urllib3/maintainers Okay, so why is this a breaking change? urllib3 only uses proxies through HTTP right now. We have two modes:
While it's currently possible to specify an HTTPS proxy for both HTTP and HTTPS traffic, we still connect to the proxy via HTTP, not HTTPS. After this pull request, for HTTPS proxies we'll:
However, those are absolutely the right things to do, and even if most users are using HTTP proxies, we might have users in the wild who think they're connecting to their proxy using TLS while they're not, or users who want things to work this way. How do we want to handle this? |
You can use something like |
@jalopezsilva Thank you for the new commit! I just realized there's a good way around the compatibility issue: we should provide a new HTTPSProxyManager for HTTPS proxies, and keep ProxyManager for HTTP proxies. And possibly add a warning when ProxyManager is used with a https proxy URL. What do you think? |
Having a Another option would be to add that Either way works for me! Let me know your thoughts @pquentin and I'll make the needed changes. Also let me know if you need help setting up a forward proxy to test this out, I can try to set up a docker image or something along those lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to make this code work using nginx, thanks @aausch for the suggestion.
I don't have a strong opinion about the gating mechanism. It could be in proxy_from_url
if we opt for a new class, or in ProxyManager
if it's easier to add the new behavior there.
It will force clients like requests and botocore to update their code to see how they want to expose this to their users, but I don't think forcing this onto them is a good idea. But then I'd still like to have the opinion of the urllib3 maintainers here as I'm new here. :)
I'm just popping in to say:
|
Sorry for the delayed update, busy times in Novemeber! I have updated the PR with two additional commits:
Hope the second commit addresses the concerns with the breaking change! We can keep commit # 2 or discard it, whatever works. Looking forwarding to merging this soon! Thanks! |
Happy New Year @pquentin, @sethmlarson! Any chance we could review this? I believe my last commit addresses the remaining concerns with breaking changes. Would love to have this merged soon! |
Thanks for the friendly tone @jalopezsilva, I'm sure the situation is very frustrating to you, and I would like to apologize for delay here. The pull request looks good to me and the general idea has been approved by @sethmlarson -- I'm now going to coordinate with him to understand what the next steps are. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I have some comments.
src/urllib3/poolmanager.py
Outdated
@@ -383,6 +408,17 @@ class ProxyManager(PoolManager): | |||
HTTPS/CONNECT case they are sent only once. Could be used for proxy | |||
authentication. | |||
|
|||
:param _enable_https_proxies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of adding flags, especially if this is a new feature. Can we have this feature be additive and then do what we're already doing in this PR and error out in an invalid scenario (HTTPS+HTTPS)?
Logical next-steps are to figure out if we can also support HTTPS+HTTPS before we release this so there's less confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll error out then. I preferred to set a warning as I was concerned that multiple clients with an incorrect configuration (i.e HTTPS + HTTPS) would be affected. But then again, it's an incorrect configuration so they might as well fix it now.
That way, when we add the HTTPS + HTTPS support clients should have their configuration up to date.
I'll make the changes and update here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great, thank you for this work! It'll make many more users connections secure. :)
@sethmlarson Whenever you have a chance, give this another go. Should be good to go! I'll create another PR for the HTTPS / HTTPS support as we had agreed. |
Currently there's no way to validate identify for the proxy you might be connecting. Proxies supporting HTTPS endpoints are becoming more common and we need to extend the support for them. When an HTTPS proxy is provided, instead of doing the HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination.
- Will be supported by default when we can do TLS within TLS.
* Renamed flag for HTTPS websites through HTTPS proxies. * Added myself to contributors.
* Removed mention that TLS in TLS is being developed as requested. * Space in between my name and the github page.
Now that we're adding support for HTTPS proxies we want to avoid a breaking change with clients that had an improper proxy configuration. For now, we're adding a warning an defaulting to the previous behavior. In the future we'll change the behavior to enable HTTPS proxies by default.
As requested in the last revision for the PR: - Removed the _enable_https_proxies flag. Instead the feature will be enabled and will error out on invalid configurations. (HTTPS + HTTPS) - Other comments: rename a method, parentheses to clarify order of operations.
59c59c4
to
1084b2a
Compare
Hey @sethmlarson, I'm starting to add the TLS within TLS support over in #1806. It would be super useful if we can merge this in so I can continue work on that PR. I think we're good to go here, let me know if we need to make any further changes. |
Hey @sethmlarson @pquentin, another ping here as I haven't heard from you guys in a while. Are we still planning on merging these changes? Facebook has been using them internally for the past 4 months without any issues. I would prefer to merge them as I want to avoid having to fork FB's internal version of urllib3. That said, I'm starting to get concerned on how long this is taking and whether we'll be able to merge them or not. Do let me know so I know how to proceed forward. Thanks! |
"Are you sure you want to use HTTPS to contact the proxy? " | ||
"This most likely indicates an error in your configuration." | ||
"If you are sure you want use HTTPS to contact the proxy, enable " | ||
"the _allow_https_proxy_to_see_traffic.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If you are sure you want use HTTPS to contact the proxy, enable "
"that by using the parameter: _allow_https_proxy_to_see_traffic.",
The requested changes have been made.
Merging, even with the typo, we can always fix it later, eg. by removing "the". Thanks @jalopezsilva for your work and your patience. Tested again, it works well, and the code looks good to me. |
Sorry for fumbling the review of this PR hard, I had tested the patch locally and it seemed to work. Thanks for tackling this. :) |
Thanks for the prompt response guys! I'll fix the typo and follow up in #1806. |
* Add support to talk HTTPS to proxies. Currently there's no way to validate identify for the proxy you might be connecting. Proxies supporting HTTPS endpoints are becoming more common and we need to extend the support for them. When an HTTPS proxy is provided, instead of doing the HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination. * Fix proxy_headers missing on HTTPS proxy connections. * blackfmt missing files. * Prevent usage of HTTPS proxies when fetching HTTPS resources. - Will be supported by default when we can do TLS within TLS. * Update proxy documentation with more information. * Renamed flag for HTTPS websites through HTTPS proxies. * Added myself to contributors. * Documentation and contributors fixes. * Removed mention that TLS in TLS is being developed as requested. * Space in between my name and the github page. * Add flag to enable HTTPS proxy support. Now that we're adding support for HTTPS proxies we want to avoid a breaking change with clients that had an improper proxy configuration. For now, we're adding a warning an defaulting to the previous behavior. In the future we'll change the behavior to enable HTTPS proxies by default. * Remove guard flag, error out on HTTPS/HTTPS. As requested in the last revision for the PR: - Removed the _enable_https_proxies flag. Instead the feature will be enabled and will error out on invalid configurations. (HTTPS + HTTPS) - Other comments: rename a method, parentheses to clarify order of operations.
* Add support to talk HTTPS to proxies. Currently there's no way to validate identify for the proxy you might be connecting. Proxies supporting HTTPS endpoints are becoming more common and we need to extend the support for them. When an HTTPS proxy is provided, instead of doing the HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination. * Fix proxy_headers missing on HTTPS proxy connections. * blackfmt missing files. * Prevent usage of HTTPS proxies when fetching HTTPS resources. - Will be supported by default when we can do TLS within TLS. * Update proxy documentation with more information. * Renamed flag for HTTPS websites through HTTPS proxies. * Added myself to contributors. * Documentation and contributors fixes. * Removed mention that TLS in TLS is being developed as requested. * Space in between my name and the github page. * Add flag to enable HTTPS proxy support. Now that we're adding support for HTTPS proxies we want to avoid a breaking change with clients that had an improper proxy configuration. For now, we're adding a warning an defaulting to the previous behavior. In the future we'll change the behavior to enable HTTPS proxies by default. * Remove guard flag, error out on HTTPS/HTTPS. As requested in the last revision for the PR: - Removed the _enable_https_proxies flag. Instead the feature will be enabled and will error out on invalid configurations. (HTTPS + HTTPS) - Other comments: rename a method, parentheses to clarify order of operations.
* Add support to talk HTTPS to proxies. Currently there's no way to validate identify for the proxy you might be connecting. Proxies supporting HTTPS endpoints are becoming more common and we need to extend the support for them. When an HTTPS proxy is provided, instead of doing the HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination. * Fix proxy_headers missing on HTTPS proxy connections. * blackfmt missing files. * Prevent usage of HTTPS proxies when fetching HTTPS resources. - Will be supported by default when we can do TLS within TLS. * Update proxy documentation with more information. * Renamed flag for HTTPS websites through HTTPS proxies. * Added myself to contributors. * Documentation and contributors fixes. * Removed mention that TLS in TLS is being developed as requested. * Space in between my name and the github page. * Add flag to enable HTTPS proxy support. Now that we're adding support for HTTPS proxies we want to avoid a breaking change with clients that had an improper proxy configuration. For now, we're adding a warning an defaulting to the previous behavior. In the future we'll change the behavior to enable HTTPS proxies by default. * Remove guard flag, error out on HTTPS/HTTPS. As requested in the last revision for the PR: - Removed the _enable_https_proxies flag. Instead the feature will be enabled and will error out on invalid configurations. (HTTPS + HTTPS) - Other comments: rename a method, parentheses to clarify order of operations.
Currently there's no way to validate the identify of the proxy you are connecting to. Proxies supporting HTTPS endpoints are becoming more common and extending support for them is necessary.
This commit adds the ability to establish an HTTPS connection to a proxy.
When an HTTPS proxy is provided, instead of doing an HTTP CONNECT, we'll forward any requests directly to the proxy and ultimately to the destination, similar to how HTTP connections through a proxy are handled.
Another implementation alternative would have been to do TLS to the proxy, later an HTTP CONNECT and within the tunnel, another TLS connection to the destination. This is more complex and requires us to use the
SSLContext.wrap_bio
methods (i.e TLS within TLS). I opted for this simpler approach as a start but I'm happy to extend if needed.