Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix existing percent encoding #4469

Closed
wants to merge 2 commits into from
Closed

Fix existing percent encoding #4469

wants to merge 2 commits into from

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Sep 2, 2022

Change Summary

As described in the issue, #4458 was double encoding URLs, this splits the URL on encoding groups with the regex (.*?)(%[\dA-F]{2}) to (hopefully) fix the issue.

Changes:

  • existing "percentage encodings" are not re-encoded - they're just cast to upper case to match RFC 3986 and httpe - e.g. foo%2Bbar is left unchanged while foo%2bbar is converted to foo%2Bbar
  • forward slash / is no longer encoded in query paramters, e.g. http://example.com?a=b/c is left unchanged

Related issue number

fix #4458

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@hramezani
Copy link
Member

I think we still will have the problem reported in #4468 even with this patch. which is a breaking change.

@samuelcolvin
Copy link
Member Author

Okay, this problem is bigger than I thought due to the strange logic to decide when to rebuild urls, see

host, tld, host_type, rebuild = cls.validate_host(parts)
return cls(
None if rebuild else url,

and

if is_international:
host_type = 'int_domain'
rebuild = True

which is completely out of date from the URL quoting - we previously only rebuild domains when required to cope with "international" TLDs, but now we should be rebuilding if there's a path or a query or a user or anything which might need quoting.

However if I just force rebuilding of all URLs, lots of tests are failing.

@samuelcolvin
Copy link
Member Author

I think we still will have the problem reported in #4468 even with this patch. which is a breaking change.

Good point, should be fixed by the last commit.

@samuelcolvin
Copy link
Member Author

replaced by #4470.

@samuelcolvin samuelcolvin deleted the fix-pecent-encoding branch December 6, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnyUrl escapes percent sign in URLs without TLD that are already escaped.
2 participants