Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some relative redirect strings incorrectly bypass iri_to_uri normalization. #2828

Closed
gmanfunky opened this issue Jan 4, 2024 · 2 comments
Closed
Milestone

Comments

@gmanfunky
Copy link

gmanfunky commented Jan 4, 2024

Summary

It appears logic introduced in #2695 to work-around the itms-services: schema oddity has an overly broad assumption that is results in invalid URIs due to not percent-encoding relative IRIs as intended.

Ultimately, this can trick browsers into behaving badly when faced with invalid relative URL locations. But I think it is ours to fix in werkzeug because the current code is emitting an invalid location header per RFCs (and the presumed intention of the code.)

RFC 3986 for URIs has a section 2 that essentially says URI components should be percent-encoded if they aren't part of the "reserved" or explicitly "unreserved" lists of characters. As an case-study: backslashes should be percent-encoded. Indeed, werkzeug seems to intent to encode them, but this issue describes the case where the typical encoding logic is not being applied.

Replication

As an example consider an app expecting to use redirect(relative_uri) and expecting werkzeug to properly encode the Location: header (as it historically would before the mentioned work-around)

from werkzeug.utils import redirect
problem_relative_uri = '/\\\\github.com?extra=data&blank=#fragment'
response = redirect(problem_relative_uri)
response.location
# '/\\\\github.com?extra=data&blank=#fragment'

Note the backslashes (\) in this URI are /not/ being encoded as %5C .

As a smaller test case:

from werkzeug.urls import iri_to_uri, _invalid_iri_to_uri
problem_relative_uri = '/\\\\github.com?extra=data&blank=#fragment'
_invalid_iri_to_uri(problem_relative_uri)
# '/\\\\github.com?extra=data&blank=#fragment'
# :-(

Expected Behavior

The previous encoding behavior before the wrapper was desirable

iri_to_uri(problem_relative_uri)
# '/%5C%5Cgithub.com?extra=data&blank=#fragment'
# :-)

Environment:

  • Python version: 3.12.1
  • Werkzeug version: 3.0.1

Relevant Code

if location is not None:
location = _invalid_iri_to_uri(location)

def _invalid_iri_to_uri(iri: str) -> str:
"""The URL scheme ``itms-services://`` must contain the ``//`` even though it does
not have a host component. There may be other invalid schemes as well. Currently,
responses will always call ``iri_to_uri`` on the redirect ``Location`` header, which
removes the ``//``. For now, if the IRI only contains ASCII and does not contain
spaces, pass it on as-is. In Werkzeug 3.0, this should become a
``response.process_location`` flag.
:meta private:
"""
try:
iri.encode("ascii")
except UnicodeError:
pass
else:
if len(iri.split(None, 1)) == 1:
return iri
return iri_to_uri(iri)

@gmanfunky
Copy link
Author

gmanfunky commented Jan 4, 2024

After @davidism 's prompt in the original issue to also file an upstream Python issue (python/cpython#104139 (comment)), Python's urlib.parse was improved with the itms-services: url schema knowledge in Python 3.12.

The cpython comment thread also mentions a potentially more tightly scoped workaround for the "_invalid_iri" issue on older python versions:

if "itms-services" not in urllib.parse.uses_netloc:
    urllib.parse.uses_netloc.append("itms-services")

Should we adopt this? Or otherwise rethink the current conditionals bypassing iri_to_uri(location_iri)?

@davidism
Copy link
Member

davidism commented May 5, 2024

Yes, I like that solution, since it allows users to tell Python in general about any special scheme they need. Note that you won't see response.location encoded in your example, that only happens at the last moment when converting the Response to HTTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants