Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization of path segments should probably happen before normalization of percent escaping #8

Open
sporkmonger opened this issue Mar 20, 2010 · 7 comments

Comments

@sporkmonger
Copy link
Owner

Addressable::URI.parse("/%2E/").normalize.to_str.should == "/%2E/"
@sporkmonger
Copy link
Owner Author

This issue probably requires a check-in with the IETF URI mailing list before deciding one way or the other.

@kovyrin
Copy link

kovyrin commented Oct 18, 2013

I understand that it's been a long time ago, but still wanted to check in to see what's up with this issue? We've hit this bug in a bit different context and are not sure how to deal with it. Any chance this going to be fixed?

@sporkmonger
Copy link
Owner Author

Could you elaborate on the issue you're hitting? A test case would be awesome.

@kovyrin
Copy link

kovyrin commented Oct 18, 2013

Actually, now I'm not sure if our issue is related to this one. Here is our problem:

irb(main):001:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f"))
=> #<Addressable::URI:0x5648890 URI:http://foo.com/blah?>
irb(main):002:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f")).normalize!
=> #<Addressable::URI:0x564ed08 URI:http://foo.com/blah%3F>

Normalize call screws up a perfectly valid (AFAIU) unicode symbol and replaces it with a latin1 question mark.

@sporkmonger
Copy link
Owner Author

It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent.

@dentarg
Copy link
Collaborator

dentarg commented Jul 19, 2023

Some more context, %2E is .

irb(main):038:0> CGI.unescapeURIComponent "%2E"
=> "."

Addressable::URI.parse("/%2E/").normalize.to_str.should == "/%2E/"

Not sure why this should be true? If you want to compare URIs, shouldn't you normalize both before comparing?


Hmm, from https://www.rfc-editor.org/rfc/rfc3986#section-2.3

Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.

  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource. However, URI comparison implementations
do not always perform normalization prior to comparison (see Section 6).
For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.

Does this mean that Addressable::URI.parse("/%2E/") should be turned into Addressable::URI.parse("/./") directly at #parse?

Normalization removes the dot and the trailing slash

irb(main):042:0> Addressable::URI.parse("/%2E/").normalize.to_s
=> "/"
irb(main):044:0> Addressable::URI.parse("/./").normalize.to_s
=> "/"

@dentarg
Copy link
Collaborator

dentarg commented Jul 19, 2023

Does this mean that Addressable::URI.parse("/%2E/") should be turned into Addressable::URI.parse("/./") directly at #parse?

That would go against what's suggested in #477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants