Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization: don't decode percent-encoded reserved characters #366

Open
janklimo opened this issue Nov 4, 2019 · 2 comments
Open

Normalization: don't decode percent-encoded reserved characters #366

janklimo opened this issue Nov 4, 2019 · 2 comments
Labels

Comments

@janklimo
Copy link

janklimo commented Nov 4, 2019

Given the following example URL:

url = "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"
addressable_uri = Addressable::URI.parse(url)
addressable_uri.normalize.to_s
=> "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom,left&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"

normalization changes overlay-align=bottom%2Cleft to overlay-align=bottom,left.

Looks harmless but this change results in getting a 401 response instead of the image itself.

Looking at the RFC, I believe this deviates from the spec which (to my understanding) suggests sub-delims should not be decoded in the normalization process.

URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent.

This SO post supports that. I came across #320 which touches on the same issue.

Please correct me if I'm reading this wrong 👍


Duplicates of this issues:


Maintainer notes:

@dentarg
Copy link
Collaborator

dentarg commented Jul 19, 2023

Sounds correct to me that percent-encoded reserved characters (in path and query) should not be decoded, from https://www.rfc-editor.org/rfc/rfc3986#section-6.2.2.2 (linked from the SO post)

The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the case
normalization issue noted above, some URI producers percent-encode
octets that do not require percent-encoding, resulting in URIs that
are equivalent to their non-encoded counterparts. These URIs should
be normalized by decoding any percent-encoded octet that corresponds
to an unreserved character, as described in Section 2.3.


Looking at the RFC, I believe this deviates from the spec which (to my understanding) suggests sub-delims should not be decoded in the normalization process.

Why did you write sub-delims (and not all reserved characters) there @janklimo? Just making sure we're on the same page

@dentarg dentarg changed the title Normalization possibly deviating from RFC Normalization: don't decode percent-encoded reserved characters Jul 19, 2023
@janklimo
Copy link
Author

This was a very long time ago but I believe I referred to sub-delims since that was the most specific group , (the bug I originally stumbled upon in my example) can be found in but I agree we should be looking at reserved characters in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants