Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix url parsing of ipv6 urls on URL.replace #1965

Merged
merged 3 commits into from Feb 6, 2023

Conversation

kousikmitra
Copy link
Contributor

The starting point for contributions should usually be a discussion

Simple documentation typos may be raised as stand-alone pull requests, but otherwise please ensure you've discussed your proposal prior to issuing a pull request.

This will help us direct work appropriately, and ensure that any suggested changes have been okayed by the maintainers.

  • Initially raised as discussion #...

@kousikmitra kousikmitra changed the title Fix url parsing of ipv6 urls #1931 #1931 Fix url parsing of ipv6 urls Nov 25, 2022
@kousikmitra kousikmitra force-pushed the fix/ipv6-url-bug branch 2 times, most recently from 49d5c0a to f248c0c Compare November 25, 2022 15:51
Comment on lines 122 to 127
if not hostname:
netloc = self.netloc
_, _, hostname = netloc.rpartition("@")

if hostname[-1] != "]":
hostname = hostname.rsplit(":", 1)[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the new approach breaks the existing undocumented, untested functionality...

May I propose a test like this:

assert URL("http://u:p@host/").replace(hostname="bar") == URL("http://u:p@bar/")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe like this:

assert URL("http://u:p@host:80").replace(port=88) == URL("http://u:p@host:88")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, these are passing

Copy link

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use urlunparse or urlunsplit from python stdlib instead of manual string concatention?

@kousikmitra
Copy link
Contributor Author

kousikmitra commented Nov 29, 2022

Would it be better to use urlunparse or urlunsplit from python stdlib instead of manual string concatention?

@dimaqq not sure how you wanted to use urlunsplit here! urlunsplit won't join netloc fields. can you explain more?

@Kludex Kludex added this to the Version 0.23.0 milestone Dec 2, 2022
@dimaqq
Copy link

dimaqq commented Dec 5, 2022

@kousikmitra
Merge branch 'master' into fix/ipv6-url-bug

Instead of the above, I would generally recommend to rebase your branch on top of master and then force-push your branch.
I don't know what git flow this particular project uses though.

@adriangb
Copy link
Member

adriangb commented Dec 5, 2022

We squash merge into master, so it's not really important how the branch is kept up to date with master or how the commits on the branch are organized. Of course a nice linear history with well described atomic commits is a huge plus as a reviewer, but we certainly don't require it.

@kousikmitra
Copy link
Contributor Author

@kousikmitra
Merge branch 'master' into fix/ipv6-url-bug

Instead of the above, I would generally recommend to rebase your branch on top of master and then force-push your branch. I don't know what git flow this particular project uses though.

I prefer the same, my bad I didn't notice github used merge method to update branch.

@dimaqq
Copy link

dimaqq commented Dec 6, 2022

my 2c: it's best to disable GitHub's magic "update the branch" in repo settings.
anyway I'm not the maintainer, I don't get to make these decisions.

Copy link

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and solves the problem that I had.

All of these work fine:

In [8]: URL("http://[1::fe]/123").replace(port=88, username="uu", password="pp").replace(port=None, username=None, password=None)
Out[8]: URL('http://[1::fe]/123')

There's still one gotcha, but I suspect this also didn't really work before either, the IPv6 hostname needs to be bracketed, which is a bit unexpected:

In [14]: URL("http://foo.com/123").replace(hostname="fe::1")
Out[14]: URL('http://fe::1/123')

In [15]: URL("http://foo.com/123").replace(hostname="[fe::1]")
Out[15]: URL('http://[fe::1]/123')

Which also leads, in one case to an exception when using IPv6 shorthands:

In [18]: URL("http://foo.com/123").replace(hostname="1::23").replace(port=88)
ValueError: Port could not be cast to integer value as ':23'

This path triggered same error before.

So, strictly speaking, this PR fixes #1931 and doesn't introduce a regression, I'd say let's merge it!

@dimaqq
Copy link

dimaqq commented Dec 7, 2022

P.S. by comparison, other libraries take care of bracketing ipv6 addresses themselves, they don't ask the caller to do it:

In [7]: yarl.URL("http://foo.com/1").with_host("123::fe")
Out[7]: URL('http://[123::fe]/1')

@kousikmitra
Copy link
Contributor Author

There's still one gotcha, but I suspect this also didn't really work before either, the IPv6 hostname needs to be bracketed, which is a bit unexpected:

Which also leads, in one case to an exception when using IPv6 shorthands:

To tackle this we should not be relying on urlsplit hostname instead we can infer hostname, port from netloc.

@dimaqq
Copy link

dimaqq commented Dec 7, 2022

🤔 I was under the impression that urlsplit does the right thing; and this library doesn't.

my 2c: .update(hostname=...) value needs to be examined, if it contains a colon, then wrap it in square brackets.
Edit: and maybe the inner logic of splitting the netloc....

Anyway, that can be a separate PR.
Edit: perhaps maintainers would like a discussion on hacking URLs in this data structure (extra code and maintenance) vs. adopting a 3p URL library (an extra dep).

@Kludex Kludex modified the milestones: Version 0.23.0, Version 0.24.0 Dec 8, 2022
@Kludex Kludex mentioned this pull request Dec 12, 2022
7 tasks
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

starlette/datastructures.py Outdated Show resolved Hide resolved
Co-authored-by: Florimond Manca <florimond.manca@protonmail.com>
@Kludex Kludex changed the title #1931 Fix url parsing of ipv6 urls Fix url parsing of ipv6 urls on URL.replace Feb 6, 2023
@Kludex Kludex enabled auto-merge (squash) February 6, 2023 05:40
@Kludex Kludex merged commit 94a22b8 into encode:master Feb 6, 2023
@kousikmitra kousikmitra deleted the fix/ipv6-url-bug branch February 6, 2023 05:58
aminalaee pushed a commit that referenced this pull request Feb 13, 2023
Co-authored-by: Florimond Manca <florimond.manca@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants