Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy URLs without schemes break under Python 3.9 #5855

Closed
leifwalsh opened this issue Jul 1, 2021 · 6 comments
Closed

proxy URLs without schemes break under Python 3.9 #5855

leifwalsh opened this issue Jul 1, 2021 · 6 comments
Labels
Milestone

Comments

@leifwalsh
Copy link

leifwalsh commented Jul 1, 2021

In https://bugs.python.org/issue27657 it was decided to reverse an earlier decision about how urllib.parse.urlparse handles URLs that don't specify a scheme (e.g. example.org:80). This behavior change is in Python 3.9.

One way this can present to a user is by confusing requests's attempt to add the scheme if it's missing:

def prepend_scheme_if_needed(url, new_scheme):

With the old behavior, urlparse would return ParseResult(scheme='', netloc='', path='example.org:80', params='', query='', fragment='') and then prepend_scheme_if_needed would happily add the scheme, typically http. With the new behavior, you instead get ParseResult(scheme='example.org', netloc='', path='80', params='', query='', fragment=''), so prepend_scheme_if_needed thinks it has a scheme, and does nothing. Then later, if this is used as a proxy URL, urllib3 will complain here, because it has its own URL parser that is now I think out of sync with the stdlib.

(I guess this means you could fix this by making requests use urllib3.util.url.parse_url instead of urllib.parse.urlparse, but that's maybe an antisocial fix)

Expected Result

If user code has given requests a proxy dict like {'http': 'example.org:80'}, the request is sent through that proxy.

Actual Result

urllib3 raises ProxySchemeUnknown

Reproduction Steps

import requests
session = requests.Session()
session.proxies = {'https': 'example.org:80'}
session.get('https://www.google.com/')

System Information

$ python -m requests.help
{
  "implementation": {
    "name": "CPython",
    "version": "3.9.5"
  },
  "platform": {
    "system": "Linux"
  },
  "requests": {
    "version": "2.22.0"
  },
  "urllib3": {
    "version": "1.25.3"
  },
  "using_pyopenssl": false
}
@timgraham
Copy link

"soon 3.7 and 3.8" in the issue title isn't correct. The behavior change was present in 3.8.1 and 3.7.6 but reverted in 3.8.2 and 3.7.7.

@leifwalsh leifwalsh changed the title proxy URLs without schemes break under Python 3.9, and soon, 3.7 and 3.8 proxy URLs without schemes break under Python 3.9 Jul 2, 2021
@leifwalsh
Copy link
Author

Oh good, I missed that, thanks. (There were a lot of bugs to read and I ended up skimming past that piece of history)

@sigmavirus24
Copy link
Contributor

@sethmlarson am I correct in that urllib3's url parsing is not a public API we can rely on?

@sethmlarson
Copy link
Member

parse_url() is in the urllib3.util namespace so is supported. I encourage it's use as it's been vetted heavily.

@nateprewitt
Copy link
Member

I did a bit of work around this last night. I have a draft PR up (#5917) that has the minimal fix for prepend_scheme_if_needed that should solve the specific problem listed. I did a separate commit that replaced all of our uses of urlparse with parse_url that raised some unfortunate side effects.

parse_url has coupled parsing and validation into the same function, so we end up with a number of failures in the code base that aren't easy to handle and raise as more reasonable errors. Things like redirects fail when we're trying to parse things like incoming location headers, which shouldn't be a terminal case. For now I think we can handle these on a case by case basis and address it more extensively if we find other failure cases with the urlparse change.

@nateprewitt nateprewitt added this to the 2.27.0 milestone Sep 2, 2021
@nateprewitt
Copy link
Member

Now that #5917 is merged, this should be set to go out on Monday with Requests 2.27.0. Resolving as fixed.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants