Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4275 fail URIs with ambiguous segments #5954

Merged
merged 12 commits into from Feb 16, 2021

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Feb 8, 2021

Fix #4275 fail URIs with ambiguous segments.

Fix #4275 fail URIs with ambiguous segments.
@gregw gregw requested a review from joakime February 8, 2021 14:47
@gregw gregw linked an issue Feb 8, 2021 that may be closed by this pull request
@gregw gregw requested a review from sbordet February 9, 2021 07:52
@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

@sbordet @joakime nudge

Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are changing HttpURI parsing, I think we should go the extra mile and do it as per RFC3986.

https://tools.ietf.org/html/rfc3986#section-3.3 says that even the , can be used as separator between a segment and a segment-parameter.

But if we go to the ABNF, possibly any sub-delims can be present, and we should not consider those as part of the segment.
Doing so would probably require another compliance mode.

I changed testAmbiguousSegments() and using a , instead of ; fails the test.

@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

@sbordet I don't see why we should treat ',' as a parameter separator? It might be used as such in some schemes, but not in HTTP. We will never split a segment by ',', but we do split by ';'. I think we need to be absolutely consistent with our handling to avoid issues, so if we start splitting on ',' here, we'd have to split everywhere and I don't think that is valid to do.

We are treating ..; as ambiguous because the RFC says we should only normalise .. and not ..;, yet we know we later split on ; so that only .. is left, which an application might normalise or might give back to us and we will normalise. I don't see the same problem with ..,.

Invert sense of ambiguous path segment compliance
added test
@gregw gregw requested a review from sbordet February 11, 2021 14:40
@sbordet
Copy link
Contributor

sbordet commented Feb 11, 2021

I don't see why we should treat ',' as a parameter separator?

Because the RFC says:

For example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.

So if ; and , are interchangeable, and we only fix ;, then we'll have the same problem when someone uses ,, no?
And the same for all the other sub-delims characters.

The fact that it's not used typically in HTTP does not mean pentesters can't generate such URIs to validate URI handling.

@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

, and ; are not interchangeable.
I known of no HTTP "URI producer" or consumer that will treat , in a segment as special.

We only have an issue with HTTP schema URIs as that is all we handle. We never split on ,, we do not remove ,param. I see no reason to not allow .., as a segment. Sure it is a silly looking segment, but then there are many other silly segments that we don't block.

Remember, we are not disallowing ..; because the RFC tells us to. We are disallowing it because the RFC tells us that it is a normal path segment and not a relative one, yet we know that we later strip the ; and thus can treat it as special. None of that holds for ,

@joakime
Copy link
Contributor

joakime commented Feb 11, 2021

There are multiple things to consider here.

First there's Reserved Characters - https://tools.ietf.org/html/rfc3986#section-2.2
Which defines the sub-delims as ...

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Which means , and ; are definitely equivalent.

The following example URIs should be considered in this PR.

/a/..;/b/
/a/..;foo/b/
/a/..;foo=bar/b/
/a/..,bar/b/
/a/..,foo=bar/b/
/a/..$/b/
/a/..!/b/
/a/..&/b/
/a/.."/b/
/a/..()/b/
/a/..*/b/
/a/..+/b/
/a/..=bar/b/

Then we have Remove Dot Segments - https://tools.ietf.org/html/rfc3986#section-5.2.4
Which is part of the resolution process for URIs.
Which pretty much tells us that . and .. are to be removed, even if they have parameters.

This is getting quite complicated, and I'm wondering if we REALLY need to support URI parameters at all?

I have been unable to find another web server that supports them.
Even Apache httpd doesn't support them.

Two examples:

# These work
$ curl https://en.wikipedia.org/wiki/Jetty
$ curl https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html

# These do not work as expected
$ curl https://en.wikipedia.org/wiki;/Jetty
# above results in a redirect to the mainpage
$ curl https://httpd.apache.org/docs/2.4/mod;/mod_rewrite.html
# above results in a 404 status response

@joakime
Copy link
Contributor

joakime commented Feb 11, 2021

This is getting quite complicated, and I'm wondering if we REALLY need to support URI parameters at all?

I retract this statement, I've found many examples in REST libraries that use various forms of the reserved characters in the URI to mean special things to the REST library.

Some even making the "contrived" example in the devtest link look easy to follow.

@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

@sbordet and @joakime, you are over complicating this and missing the actual issue!

Our URI handling is correct according to both the RFC and the servlet spec. However, there is a problem between those two spec. The RFC says that ; might be a special character, yet does not say it should be stripped at the URI level. However the servlet spec expects that path parameters will be stripped. Our code is aware of this and we treat mapping of both security constraints and servlets in exactly the same way, so there can be no bypasses. However, because we strip parameters from the URI before giving them to the servlet, then we throw away information and the application can get a path with a .. segment. It should know that this is not really a .. segment and that it came either from %2e%2e or ..;param. However, it is pretty easy for the application to forget this an normalise it, or not re-encode it and give it back to us, so we normalise it.

This is ONLY a problem with ; and % encoding, because they are the only two ways that we can ever decode a URI and give the application a .. segment. The solution I have coded here is to just not support %2e%2e and ..;param (plus a few variations of those). If there are some other URIs that we also decode to result in .. segments, then we need to handle those as well, but none of the examples either of you have listed appear to be such examples.

@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

Note also that the code doesn't actually treat ; that specially. It has modified the URI parser so that any character than ends a segment (;, ?, '#, etc.) will result in the segment being check as potentially ambiguous.

However, I will double check how we handle things like /foo/..?bar ...

@gregw
Copy link
Contributor Author

gregw commented Feb 11, 2021

OK, I think I have found one more related issue. The RFC says the URI is:

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

We correctly normalise /foo/bar/..?query to /foo#query, but we don't do /foo/bar/..#fragment to /foo#fragment. Thus when we strip the fragment later, we may get a similar problem.
So I think our URIUtil canonical path methods should probably also handle # just like ?, even though that should never come over the wire.

Fixed tests for `%2e%2e`
 + Correctly handle canonical trailing '.' and '..'
 + Perform canonical path resolution before decoding
 + Treat decoding of %2f as an ambiguous segment.
 + better tests of ambiguous segments
Copy link
Contributor

@joakime joakime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the majority of this PR, I just don't like one aspect of the ambiguousSegments list.

 + Fixed test failures that were using %2f
Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw gregw requested a review from joakime February 15, 2021 08:21
@gregw
Copy link
Contributor Author

gregw commented Feb 15, 2021

@sbordet nudge

 + simplified tests
Signed-off-by: Greg Wilkins <gregw@webtide.com>
Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw gregw requested a review from sbordet February 15, 2021 23:32
updates from review
@gregw gregw requested a review from sbordet February 16, 2021 13:35
if (query != null)
_query = query;
if (fragment != null)
_fragment = fragment;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do null checks for just assignments.
Path needs null check because path.length() may NPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing null checks because if we parse the path, we may discover params, fragments and queries in it. So either we through if they are non null, or just not overwrite them unless we have explicit non null values.

updates from review
@gregw gregw requested a review from sbordet February 16, 2021 13:40
@gregw gregw merged commit 20ef71f into jetty-9.4.x Feb 16, 2021
@gregw gregw deleted the jetty-9.4.x-4275-ambiguous-path-segments branch February 16, 2021 13:47
gregw added a commit that referenced this pull request Feb 17, 2021
Rewrite rule depending on probably bug that setURIPathQuery did not actually set query if none was passed.
This is a bit of a work around, but further review is needed to see if anything else relied on that behaviour.
gregw added a commit that referenced this pull request Feb 17, 2021
Re re Fix rewrite tests for #5954. Restore setPathQuery behaviour to preserve queries if none are set.
gregw added a commit that referenced this pull request Feb 17, 2021
Re re Fix rewrite tests for #5954. Restore setPathQuery behaviour to preserve queries if none are set.
gregw added a commit that referenced this pull request Feb 17, 2021
Cleanup setPathQuery from #5954 so that it does not retain an old query if the passed string does not contain a query
gregw added a commit that referenced this pull request Feb 17, 2021
Cleanup setPathQuery from #5954 so that it does not retain an old query if the passed string does not contain a query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Path Normalization/Traversal - Context Matching
3 participants