Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonicalize_url incorrectly handles port when using hostname that requires IDNA encoding #222

Open
hwo411 opened this issue Mar 6, 2024 · 1 comment
Labels

Comments

@hwo411
Copy link

hwo411 commented Mar 6, 2024

Hello,

We just recently encountered the following problem:

canonicalize_url('https://тест.тест:33')

which returns
https://xn--e1aybc.xn--:33-qdd4dec/

while the expected value is

https://xn--e1aybc.xn--e1aybc:33/

And that happens to every hostname that required IDNA encoding for their TLD.

Could you please fix this behavior?

@Gallaecio Gallaecio added the bug label Mar 6, 2024
@hwo411 hwo411 changed the title canonicalize_url incorrectly handles port when using hostname that requires IDNA encoding canonicalize_url incorrectly handles port and multiple dots in the end of the domain when using hostname that requires IDNA encoding Mar 26, 2024
@hwo411
Copy link
Author

hwo411 commented Mar 26, 2024

I also discovered one more related thing with multiple dots in the end of the domain:

>>> canonicalize_url('http://example.com.../тест')
'http://example.com.../%D1%82%D0%B5%D1%81%D1%82'
>>> canonicalize_url('http://тест.тест./тест')
'http://xn--e1aybc.xn--e1aybc./%D1%82%D0%B5%D1%81%D1%82'
>>> canonicalize_url('http://тест.тест.../тест')
'http://тест.тест.../%D1%82%D0%B5%D1%81%D1%82'

As you can see, single dot is handled properly, but with 2+ dots it doesn't encode the domain at all.

Update: it seems to be an invalid url according to the standard, so maybe the behavior is correct, though in other languages some url validators accept it and handle normally. So not sure if this addendum has to be fixed, I'll revert the title back.

@hwo411 hwo411 changed the title canonicalize_url incorrectly handles port and multiple dots in the end of the domain when using hostname that requires IDNA encoding canonicalize_url incorrectly handles port when using hostname that requires IDNA encoding Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants