Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/anyurl escapes percent sign in url #4461

Closed

Conversation

vetedde
Copy link

@vetedde vetedde commented Sep 1, 2022

Change Summary

Fix re-quote when build url. I added relevant cases to test.
Also fix mypy error, because without it I couldn't create commit

Related issue number

Fix #4458

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details.
    You can skip this check if the change does not need a change file.)
  • [] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

pydantic/mypy.py Outdated Show resolved Hide resolved
@hramezani
Copy link
Member

Please add tests for your change into test_build_url_quote_plus

changes/4458-vetedde.md Outdated Show resolved Hide resolved
@hramezani
Copy link
Member

Thanks @vetedde for this patch 👍
I left some comments. please update

Co-authored-by: Hasan Ramezani <hasan.r67@gmail.com>
@@ -396,6 +396,10 @@ def apply_default_parts(cls, parts: 'Parts') -> 'Parts':

@classmethod
def quote(cls, string: str, safe: str = '') -> str:
pattern = r'^([\w]+|(%\d{2}))+$'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please compile the pattern and then use the match on the compiled pattern.
Take a look at the same implementation in datetime_parse.py

date_re = re.compile(f'{date_expr}$')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining what this regex is doing?

I also think it could be simplified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex checks that string contains only %xx or A-Za-z0-9 and _ and nothing else.
Pattern work with cases: http://regexr.com/6t6g6

@vetedde
Copy link
Author

vetedde commented Sep 2, 2022

@hramezani found a problem with case "foo%20bar", when quote_plus is True. My solution returns foo%20bar, but need foo+bar. I'm thinking of adding an extra check, but it's nested check and still have to unquote at some point

string_with_precent_code_re = re.compile(r'^([\w]+|(%\d{2}))+$')

if string_with_precent_code_re.match(string) is not None:
    if cls.quote_plus:
        string = unquote_plus(string)
    else:
        return string

return quote_plus(string, safe) if cls.quote_plus else quote(string, safe)

Maybe it's better to do it at once:

string = unquote_plus(string, safe) if cls.quote_plus else unquote(string, safe)
return quote_plus(string, safe) if cls.quote_plus else quote(string, safe)

or always do unquote if match %xx

What do you think?

@hramezani
Copy link
Member

@vetedde I think there are 3 possible solutions:

  1. unquote string in any case and then quote it
  2. unquote string if it is already quoted(checking by regex for example) and then quote it
  3. revert this feature because seems it is complex and is not good performance-wise (we need to unquote and then quote)

What do you think @samuelcolvin

@samuelcolvin
Copy link
Member

please have a look at #4469, I think it's a cleaner approach to solve this. I've also added a large number of test cases to at least lock the behaviour. I'll reply on the original issue with some more context.

Thanks so much @vetedde for looking into this, I'll leave this PR open in case it is decided that it's a better approach than #4469.

@hramezani
Copy link
Member

I think we still will have the problem reported in #4468 even with this patch. which is a breaking change.

@samuelcolvin
Copy link
Member

After some consideration and discussions, I've decided to replace this by #4470. @vetedde thanks again for your contribution.

See #4470 for a full explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnyUrl escapes percent sign in URLs without TLD that are already escaped.
3 participants