Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WHATWG URL parsing, if available, instead of url.parse #2564

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dresende
Copy link
Collaborator

@dresende dresende commented Nov 21, 2023

The strategy:

  1. Check if URL is a function
  2. Check if URL.prototype is an object

If the checks are true, use new URL instead of url.parse.

The only 2 changes I know of, are:

  1. URL has username and password already parsed, no need to check for auth
  2. URL has searchParams instead of query

Closes #2563

@dougwilson
Copy link
Member

Make sure to add a new test case for what you are trying to fix with this. Also curious on the change to the existing test: would this change now break anyone who has a colon in thier password?

@dresende
Copy link
Collaborator Author

Make sure to add a new test case for what you are trying to fix with this. Also curious on the change to the existing test: would this change now break anyone who has a colon in thier password?

That's an interesting question. I did this in kind of a hurry but then figured I could just pass an object and avoid the problem I was having. I'm not in a hurry now, so I'll sure add a test for that. About people using and it breaking, I think I need to change the parser to decode the password when using WHATWG URL as it seems it is always encoded when parsed.

@dresende
Copy link
Collaborator Author

> new URL("foo://user:pa:ss@host")
URL {
  href: 'foo://user:pa%3Ass@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pa%3Ass',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> new URL("foo://user:pa%3Ass@host")
URL {
  href: 'foo://user:pa%3Ass@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pa%3Ass',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

It always encodes, whether it's already encoded or not.

@dresende
Copy link
Collaborator Author

dresende commented Nov 21, 2023

What was failing for me was that I had a % on the password and the URL was not encoded, and because after that character there wasn't necessarily 2 hex characters, url.parse fails because it tries to decode and fails.

In the following example, assume password is pass% (not encoded).

> url.parse("foo://user:pass%@host")
Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at Url.parse (node:url:368:19)
    at Object.urlParse [as parse] (node:url:156:13)
> new URL("foo://user:pass%@host")
URL {
  href: 'foo://user:pass%@host',
  origin: 'null',
  protocol: 'foo:',
  username: 'user',
  password: 'pass%',
  host: 'host',
  hostname: 'host',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

What's failing:

> decodeURIComponent("pass%")
Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)

@dougwilson
Copy link
Member

dougwilson commented Nov 21, 2023

But shouldn't you always need to url encodure your password? What if your password is pa%0foo ?

@dresende
Copy link
Collaborator Author

But shouldn't you always need to url encodure your password? Whay is your password is pa%0foo ?

In that case you should, but according to the spec you don't need to unless it causes ambiguity (like your case). Password must be encoded if it has % and 2 hex chars next to it.

@dougwilson
Copy link
Member

Gotcha. What are we expecting the outcome of mysql://foo:bar%fzo%25ab@localhost to be? I know you said that encoding the % is not required unless there are two hex chars after it, but this one seems weird when put through the code here, as it seems that %25 is ending up in the password instead of it being just a % character, though new URL does parse it now. I think the way decodeURIComponent is used may need to be changed to match how the new URL is working so they are supporting the same semantics.

@dresende
Copy link
Collaborator Author

I think the way decodeURIComponent is used may need to be changed to match how the new URL is working so they are supporting the same semantics.

I don't know the implementation. Do you have any hints? As a funny note, your example URL is parsed in a different way on Chrome. I thought the implementation was the same.

Screenshot 2023-11-21 at 14 59 37

@dresende
Copy link
Collaborator Author

dresende commented Nov 21, 2023

@dougwilson you meant something like this?

function decodeUriComponent(str) {
	return str.replace(/\%([a-f0-9]{2})/ig, (_, hex) => (String.fromCharCode(parseInt(hex, 16))));
}

(but converted to run on older node versions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Updating URL parsing to WHATWG URL
2 participants