Normalization issue with `#` (`%23`) #295

matthieuprat · 2018-03-22T13:22:50Z

Not sure this is an actual bug:

Addressable::URI.parse("http://example.org?foo=%E9%23").normalize.query

This returns the string foo=%E9#. I would have expected foo=%E9%23.

Note that %E9 is the escaped version of the é character in ISO-8859-1, that is URI.escape('é'.encode('iso-8859-1')).

Is this the intended behavior?

The text was updated successfully, but these errors were encountered:

dentarg · 2018-06-10T08:36:03Z

No answers, but I found that #query and #query_values doesn't match:

$ irb -raddressable/uri
irb(main):001:0> Addressable::VERSION::STRING
=> "2.5.2"
irb(main):002:0> Addressable::URI.parse("http://example.org?foo=%E9%23").normalize.query
=> "foo=%E9#"
irb(main):003:0> Addressable::URI.parse("http://example.org?foo=%E9%23").normalize.query_values
=> {"foo"=>"\xE9#"}
irb(main):004:0> Addressable::URI.unencode("%E9%23")
=> "\xE9#"

I think this issue is similar to #224

sporkmonger · 2018-08-07T09:09:04Z

https://tools.ietf.org/html/rfc3986#section-2.5 applies here.

When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".

sporkmonger · 2018-08-07T09:15:01Z

We should probably add a test case for uri.query_values = {"À": "ア"} since it's a cited example.

dentarg · 2019-02-19T16:09:58Z

Related to #334

dentarg · 2023-07-19T07:05:31Z

I found that #query and #query_values doesn't match

Probably due to #114 (comment)

Ultimately, the query_values method is attempting to emulate the application/x-www-form-urlencoded content type, poorly specified though it may be.

dentarg · 2023-07-19T07:09:36Z

This returns the string foo=%E9#. I would have expected foo=%E9%23.

I think this is another variant of #366 where addressable incorrectly decodes the percent-encoded reserved character # (%23)

dentarg added the Duplicate label Jul 19, 2023

dentarg mentioned this issue Jul 19, 2023

Normalization: don't decode percent-encoded reserved characters #366

Open

dentarg changed the title ~~URI normalization issue with ISO-8859-1 encoding~~ Normalization issue with # (%23) Jul 19, 2023

dentarg added the Accepted label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization issue with `#` (`%23`) #295

Normalization issue with `#` (`%23`) #295

matthieuprat commented Mar 22, 2018 •

edited

dentarg commented Jun 10, 2018

sporkmonger commented Aug 7, 2018 •

edited

sporkmonger commented Aug 7, 2018 •

edited

dentarg commented Feb 19, 2019

dentarg commented Jul 19, 2023

dentarg commented Jul 19, 2023

Normalization issue with # (%23) #295

Normalization issue with # (%23) #295

Comments

matthieuprat commented Mar 22, 2018 • edited

dentarg commented Jun 10, 2018

sporkmonger commented Aug 7, 2018 • edited

sporkmonger commented Aug 7, 2018 • edited

dentarg commented Feb 19, 2019

dentarg commented Jul 19, 2023

dentarg commented Jul 19, 2023

Normalization issue with `#` (`%23`) #295

Normalization issue with `#` (`%23`) #295

matthieuprat commented Mar 22, 2018 •

edited

sporkmonger commented Aug 7, 2018 •

edited

sporkmonger commented Aug 7, 2018 •

edited