Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnyUrl escapes percent sign in URLs without TLD that are already escaped. #4458

Closed
6 of 16 tasks
alliefitter opened this issue Aug 31, 2022 · 6 comments · Fixed by #4470
Closed
6 of 16 tasks

AnyUrl escapes percent sign in URLs without TLD that are already escaped. #4458

alliefitter opened this issue Aug 31, 2022 · 6 comments · Fixed by #4470
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@alliefitter
Copy link

Initial Checks

  • I have searched GitHub for a duplicate issue and I'm sure this is something new
  • I have searched Google & StackOverflow for a solution and couldn't find anything
  • I have read and followed the docs and still think this is a bug
  • I am confident that the issue is with pydantic (not my code, or another library in the ecosystem like FastAPI or mypy)

Description

When using AnyUrl with a localhost URL (or any URL without a top level domain), if the provided URL already is encoded/escaped, pydantic will escape the percent sign again. When the domain has a TLD, it works perfectly fine. I've confirmed that the bug isn't present in 1.9.1. It seems to have been introduced in 1.10.0 and is still present in 1.10.1.

Example Code

from pydantic import BaseModel, AnyUrl


class Foo(BaseModel):
    url: AnyUrl


class Bar(BaseModel):
    url: str


assert Foo(url='https://example.com/Foo%20Bar').url == 'https://example.com/Foo%20Bar'  # Passes
assert Bar(url='https://localhost/Foo%20Bar').url == 'https://localhost/Foo%20Bar'  # Passes
assert Foo(url='https://localhost/Foo%20Bar').url == 'https://localhost/Foo%20Bar'  # Fails: https://localhost/Foo%2520Bar

Python, Pydantic & OS Version

$ python -c "import pydantic.utils; print(pydantic.utils.version_info())"
             pydantic version: 1.10.1
            pydantic compiled: True
                 install path: /home/afitter/.local/lib/python3.10/site-packages/pydantic
               python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
                     platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.35
     optional deps. installed: ['typing-extensions']

Affected Components

@alliefitter alliefitter added bug V1 Bug related to Pydantic V1.X unconfirmed Bug not yet confirmed as valid/applicable labels Aug 31, 2022
@samuelcolvin
Copy link
Member

Good catch, thank you for reporting, caused by #3061.

@FaresAhmedb what do you think?

I'm not on my laptop, so can't confirm until tomorrow.

@faresbakhit
Copy link
Contributor

faresbakhit commented Aug 31, 2022

It seems that AnyUrl.build only gets called when there's no TLD.

from pydantic import BaseModel, AnyUrl

class DebugAnyUrlBuild(AnyUrl):
    def build(**kwargs):
        print(f'{kwargs=}')
        return AnyUrl.build(**kwargs)

class Foo(BaseModel):
    url: DebugAnyUrlBuild

Foo(url='https://localhost/foo%20bar')
# > kwargs={'scheme': 'https', 'user': None, 'password': None, 'host': 'localhost', 'tld': None, 'host_type': 'int_domain', 'port': None, 'path': '/foo%20bar', 'query': None, 'fragment': None}

Foo(url='https://example.com/foo%20bar')
# >

While AnyUrl.build shouldn't re-quote already encoded characters (maybe?, can be fixed by calling urllib.parse.unquote before quoting), I wonder why AnyUrl.build only gets called when there's no TLD.

@hramezani
Copy link
Member

I can confirm the bug.

Yes, we re-quote already quoted strings.

another possible solution would be to check if the string is quoted and then prevent re-quoting.

@hramezani hramezani removed the unconfirmed Bug not yet confirmed as valid/applicable label Aug 31, 2022
@vetedde
Copy link

vetedde commented Sep 1, 2022

Here url is None when no TLD and than it calls build

class AnyUrl(str):
    ...

    @no_type_check
    def __new__(cls, url: Optional[str], **kwargs) -> object:
        return str.__new__(cls, cls.build(**kwargs) if url is None else url)

AnyUrl rebuilds url, because tld is None anf rebuild flag becomes True

if tld is None and not is_international:
    d = int_domain_regex().fullmatch(host)
    assert d is not None
    tld = d.group('tld')
    is_international = True

if tld is not None:
    tld = tld[1:]
elif cls.tld_required:
    raise errors.UrlHostTldError()

if is_international:
    host_type = 'int_domain'
    rebuild = True
    host = host.encode('idna').decode('ascii')
    if tld is not None:
        tld = tld.encode('idna').decode('ascii')

https://github.com/pydantic/pydantic/blob/main/pydantic/networks.py#L366

@samuelcolvin
Copy link
Member

samuelcolvin commented Sep 2, 2022

Thanks so much for looking into this.

I'm really sorry, it seems the originally implementation was not satisfactory, I made a mistake in merging it when it wasn't ready.

I've implemented #4469 which tries to solve the problem but also provides a bunch more test cases to ensure behaviour.

I've also noticed that we probably made the wrong mistake by going with quote_plus=False by default - e.g. the behaviour does not match chrome or httpie and I think does not match what is suggested by RFC 3986, again this is my fault. However I don't want to change this now, it can wait for V2. Ignore this, I think this can be fixed in #4469 by adding + to the safe list.

Lastly the initial implementation wrongly quoted / in query parameters, that is fixed in #4469.


Please review and test #4469 and hopefully we can get this right (quote_plus issue apart) in a patch release.

@samuelcolvin
Copy link
Member

After some consideration and discussions, I've decided to fix this by reverting #4224 in #4470.

See #4470 for a full explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
5 participants