Inconsistent Handling of Invalid Urls #1832

cancan101 · 2021-09-01T14:33:53Z

If I call: httpx.get('https://😇'), an InvalidURL exception is raised. However, if I call: httpx.get('https:/google.com'), instead I get an UnsupportedProtocol exception, which seems inconsistent. I would expect the latter to raise an InvalidURL as well.

For reference, requests raises a InvalidURL in both cases.

This issue also exists for requests.get('https:///google.com') vs httpx.get('https:///google.com').

The text was updated successfully, but these errors were encountered:

tomchristie · 2021-09-01T14:57:56Z

Thanks @cancan101 - Appreciate the neatly summarised issue.

tomchristie · 2021-10-20T09:39:37Z

So...

Good place to start with something like this is to unpick it into the smallest possible components, in order to figure out exactly what behaviour we want, and that is consistent with the API throughout.

Here's where I got to after some first steps...

>>> u = httpx.URL('https://😇')  # Raises `InvalidURL`.

And in contrast...

>>> u = httpx.URL('https:/google.com')
>>> u.scheme
'https'
>>> u.host
''
>>> u.path
'/google.com'

Now. That's not necessarily undesired behaviour at this point. We might consider the first to be a strictly invalid URL, and the second to be a valid URL, that just happens to be missing a hostname.

We can also confirm that URLs instantiated with explicit portions end up the same way here...

>>> u = httpx.URL(scheme='https', path='/google.com')
URL('https:/google.com')

So at this level of the API we are at least consistent.

I'm somewhat surprised tho, at why that results in an UnsupportedProtocol when used to send a request. (Since it has a protocol, but is missing a host.)

Let's take a closer look...

>>> c = httpx.Client()
>>> r = c.build_request('GET', 'https:/google.com')
>>> r.url
URL('/google.com')

Hrm. Somethings changed here once we've started build_request, what's that about exactly?
Haven't dug into it yet, not sure.

stale · 2022-02-20T15:13:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomchristie · 2022-02-21T13:31:27Z

Still valid thx, @Stale bot.

stale · 2022-03-25T07:30:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shelbylsmith · 2022-06-24T17:53:39Z

@tomchristie
I think the issue may stem from checking whether the URL is absolute:

httpx/httpx/_urls.py

Lines 341 to 350 in 5b06aea

    
               def is_absolute_url(self) -> bool: 
        
                   """ 
        
                   Return `True` for absolute URLs such as 'http://example.com/path', 
        
                   and `False` for relative URLs such as '/path'. 
        
                   """ 
        
                   # We don't use `.is_absolute` from `rfc3986` because it treats 
        
                   # URLs with a fragment portion as not absolute. 
        
                   # What we actually care about is if the URL provides 
        
                   # a scheme and hostname to which connections should be made. 
        
                   return bool(self._uri_reference.scheme and self._uri_reference.host)

In this case, since u.host is '', it causes this check to return false and so it is treated as a relative URL.
This means that _merge_url tries to prepend self.base_url to the 'relative' URL, but since it's intended to be an absolute URL, there is no base_url provided and so the scheme has now been removed. Since there is no longer a scheme, we receive UnsupportedProtocol:

httpx/httpx/_client.py

Lines 369 to 389 in 5b06aea

    
               def _merge_url(self, url: URLTypes) -> URL: 
        
                   """ 
        
                   Merge a URL argument together with any 'base_url' on the client, 
        
                   to create the URL used for the outgoing request. 
        
                   """ 
        
                   merge_url = URL(url) 
        
                   if merge_url.is_relative_url: 
        
                       # To merge URLs we always append to the base URL. To get this 
        
                       # behaviour correct we always ensure the base URL ends in a '/' 
        
                       # separator, and strip any leading '/' from the merge URL. 
        
                       # 
        
                       # So, eg... 
        
                       # 
        
                       # >>> client = Client(base_url="https://www.example.com/subpath") 
        
                       # >>> client.base_url 
        
                       # URL('https://www.example.com/subpath/') 
        
                       # >>> client.build_request("GET", "/path").url 
        
                       # URL('https://www.example.com/subpath/path') 
        
                       merge_raw_path = self.base_url.raw_path + merge_url.raw_path.lstrip(b"/") 
        
                       return self.base_url.copy_with(raw_path=merge_raw_path) 
        
                   return merge_url

Is it sufficient to just check in is_absolute_url if there is a scheme since in a relative URL it would be provided in base_url?

tomchristie added the user-experience Ensuring that users have a good experience using the library label Sep 1, 2021

aminalaee mentioned this issue Oct 20, 2021

Map UnsupportedProtocol to InvalidURL #1904

Closed

stale bot added the wontfix label Feb 20, 2022

stale bot removed the wontfix label Feb 21, 2022

stale bot added the wontfix label Mar 25, 2022

tomchristie removed the wontfix label Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Handling of Invalid Urls #1832

Inconsistent Handling of Invalid Urls #1832

cancan101 commented Sep 1, 2021 •

edited

tomchristie commented Sep 1, 2021 •

edited

tomchristie commented Oct 20, 2021 •

edited

stale bot commented Feb 20, 2022

tomchristie commented Feb 21, 2022

stale bot commented Mar 25, 2022

shelbylsmith commented Jun 24, 2022

Inconsistent Handling of Invalid Urls #1832

Inconsistent Handling of Invalid Urls #1832

Comments

cancan101 commented Sep 1, 2021 • edited

tomchristie commented Sep 1, 2021 • edited

tomchristie commented Oct 20, 2021 • edited

stale bot commented Feb 20, 2022

tomchristie commented Feb 21, 2022

stale bot commented Mar 25, 2022

shelbylsmith commented Jun 24, 2022

cancan101 commented Sep 1, 2021 •

edited

tomchristie commented Sep 1, 2021 •

edited

tomchristie commented Oct 20, 2021 •

edited