`normalized_encode` incorrectly replaces `%3A` and `%2F` in path #472

davidtaylorhq · 2022-08-04T11:32:30Z

Given a valid URL like https://example.com/article/id%3A1.2%2F1/bar, addressable's normalization replaces %3A with a literal :, and %2F with a literal /. Both of these replacements change the URL, meaning that requests for the URL may fail.

> Addressable::URI.normalized_encode("https://example.com/article/id%3A1.2%2F1/bar")
=> "https://example.com/article/id:1.2/1/bar"

Using URI#normalize has the same issue with %3A, but seems to preserve the %2F correctly:

> Addressable::URI.parse("https://example.com/article/id%3A1.2%2F1/bar").normalize.to_s
=> "https://example.com/article/id:1.2%2F1/bar"

My understanding of RFC3986 is that reserved characters (including : and /) should not be decoded during normalization.

The text was updated successfully, but these errors were encountered:

normalized_encode in addressable has a number of issues, including sporkmonger/addressable#472 To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable. Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future. This commit also adds support for unicode domain names and emoji domain names with escape_uri. This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass. UrlHelper.s3_presigned_url? which was somewhat wide was removed.

sporkmonger · 2022-09-07T19:05:21Z

I agree w/ your reading of the spec, only unreserved characters should be getting decoded.

dentarg · 2023-07-19T06:56:05Z

I think this issue is the same as #366 but I'll keep this one open until it has been addressed.

SamSaffron mentioned this issue Aug 9, 2022

FIX: broken onebox images due to escape_uri bugs discourse/discourse#17840

Merged

sporkmonger mentioned this issue Sep 7, 2022

Suggestion: conservative_normalize! #475

Open

sporkmonger added the Accepted label Sep 7, 2022

ClearlyClaire mentioned this issue May 12, 2023

Overly-eager decoding of reserved chars when performing HTTP requests mastodon/mastodon#24932

Open

dentarg added the Duplicate label Jul 19, 2023

dentarg mentioned this issue Jul 19, 2023

Normalization: don't decode percent-encoded reserved characters #366

Open

ClearlyClaire mentioned this issue Jul 28, 2023

Do not normalize URL before fetching it mastodon/mastodon#26219

Merged

c960657 mentioned this issue Oct 12, 2023

Allow non-RFC 3986-compliant URLs sparklemotion/http-cookie#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`normalized_encode` incorrectly replaces `%3A` and `%2F` in path #472

`normalized_encode` incorrectly replaces `%3A` and `%2F` in path #472

davidtaylorhq commented Aug 4, 2022

sporkmonger commented Sep 7, 2022

dentarg commented Jul 19, 2023

normalized_encode incorrectly replaces %3A and %2F in path #472

normalized_encode incorrectly replaces %3A and %2F in path #472

Comments

davidtaylorhq commented Aug 4, 2022

sporkmonger commented Sep 7, 2022

dentarg commented Jul 19, 2023

`normalized_encode` incorrectly replaces `%3A` and `%2F` in path #472

`normalized_encode` incorrectly replaces `%3A` and `%2F` in path #472