Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excess leading path separators causes ConnectionPool.urlopen to parse URL as host & port #3352

Open
sigmavirus24 opened this issue Feb 22, 2024 · 1 comment

Comments

@sigmavirus24
Copy link
Contributor

Subject

Originally reported in Requests (psf/requests#6643) but also applicable here, : is a valid character in a path per RFC3986. However, if a path is used in urllib3 where the path starts with // then urllib3 interprets this as a scheme-less host and attempts to parse the hostname and port but the part after the : may not be an integer.

Environment

Describe your environment.
At least, paste here the output of:

import platform
import ssl
import urllib3

print("OS", platform.platform())
print("Python", platform.python_version())
print(ssl.OPENSSL_VERSION)
print("urllib3", urllib3.__version__)
OS Linux-6.5.10-200.fc38.x86_64-x86_64-with-glibc2.38
Python 3.12.1
OpenSSL 3.1.1 30 May 2023
urllib3 2.2.1

Steps to Reproduce

import urllib3

urllib3.PoolManager().urlopen(method="GET", url="http://127.0.0.1:10000//v:h")

Expected Behavior

If you have a server running on that port, a 200 OK, otherwise a connection error.

Actual Behavior

  File "/home/sigmavirus24/sandbox/requests/.tox/py312/lib/python3.12/site-packages/urllib3/util/url.py", line 425, in parse_url
    host, port = _HOST_PORT_RE.match(host_port).groups()  # type: ignore[union-attr]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'groups'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sigmavirus24/sandbox/requests/.tox/py312/lib/python3.12/site-packages/urllib3/poolmanager.py", line 444, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sigmavirus24/sandbox/requests/.tox/py312/lib/python3.12/site-packages/urllib3/connectionpool.py", line 711, in urlopen
    parsed_url = parse_url(url)
                 ^^^^^^^^^^^^^^
  File "/home/sigmavirus24/sandbox/requests/.tox/py312/lib/python3.12/site-packages/urllib3/util/url.py", line 451, in parse_url
    raise LocationParseError(source_url) from e
urllib3.exceptions.LocationParseError: Failed to parse: //v:h

Part of this is that u.request_uri is properly parsed and returned as //v:h but ConnectionPool.urlopen seems to be doing the same thing that PoolManager.urlopen does and the thing that PoolManager.urlopen is trying to delegate to it - i.e., it:

If, however, we can (as early as possible) normalize the URL then we'll be in a better place. For example, this doesn't happen if you do this on the ConnectionPool level directly:

>>> connpool = urllib3.HTTPConnectionPool(host='localhost', port=10000)
>>> connpool.urlopen(method='GET', url='http://localhost:10000//v:h')
...
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=10000): Max retries exceeded with url: http://localhost:10000//v:h (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcb11ad7020>: Failed to establish a new connection: [Errno 111] Connection refused'))

And so I think that in PoolManager specifically, we need to trim those leading /. There are a few ways to do this:

s = '//v:h'
print(f'/{s.lstrip("/")}')
# => '/v:h'
print(re.sub('^/+', '/', s))
# => '/v:h'
@sigmavirus24
Copy link
Contributor Author

I'm also worried about how this is handled in when handling redirects in the retry logic (e.g., server returns a Location: //v:h or Location: //host:port//v:h but I haven't investigated that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant