Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalized_encode incorrectly replaces %3A and %2F in path #472

Open
davidtaylorhq opened this issue Aug 4, 2022 · 2 comments
Open

normalized_encode incorrectly replaces %3A and %2F in path #472

davidtaylorhq opened this issue Aug 4, 2022 · 2 comments

Comments

@davidtaylorhq
Copy link
Contributor

Given a valid URL like https://example.com/article/id%3A1.2%2F1/bar, addressable's normalization replaces %3A with a literal :, and %2F with a literal /. Both of these replacements change the URL, meaning that requests for the URL may fail.

> Addressable::URI.normalized_encode("https://example.com/article/id%3A1.2%2F1/bar")
=> "https://example.com/article/id:1.2/1/bar"

Using URI#normalize has the same issue with %3A, but seems to preserve the %2F correctly:

> Addressable::URI.parse("https://example.com/article/id%3A1.2%2F1/bar").normalize.to_s
=> "https://example.com/article/id:1.2%2F1/bar"

My understanding of RFC3986 is that reserved characters (including : and /) should not be decoded during normalization.

davidtaylorhq pushed a commit to discourse/discourse that referenced this issue Aug 9, 2022
normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472

To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.

Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.

This commit also adds support for unicode domain names and emoji domain names with escape_uri.

This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.
davidtaylorhq pushed a commit to discourse/discourse that referenced this issue Aug 9, 2022
normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472

To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.

Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.

This commit also adds support for unicode domain names and emoji domain names with escape_uri.

This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.

UrlHelper.s3_presigned_url? which was somewhat wide was removed.
@sporkmonger
Copy link
Owner

I agree w/ your reading of the spec, only unreserved characters should be getting decoded.

@dentarg
Copy link
Collaborator

dentarg commented Jul 19, 2023

I think this issue is the same as #366 but I'll keep this one open until it has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants