Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possible-forgotten-f-prefix checker #4787

Closed
wants to merge 22 commits into from

Conversation

DanielNoord
Copy link
Collaborator

To ease the process of reviewing your PR, do make sure to complete the following boxes.

  • Add yourself to CONTRIBUTORS if you are a new contributor.
  • Add a ChangeLog entry describing what your PR does.
  • If it's a new feature, or an important bug fix, add a What's New entry in
    doc/whatsnew/<current release.rst>.
  • Write a good description on what the PR does.

Type of Changes

Type
✨ New feature

Description

This checks if text in strings in between {}'s are variables.
The var the string is assigned to is also checked for a format() call.
If this does not happen, the string should probably be a f-string and a message is emitted.
This closes #2507

Reason for this PR being a draft

Ideally I would also add a check for the following code:

x = "This is a calculation {1 + 1}" # bad

x = f"This is a calculation {1 + 1}" # good

Can we check that 1 + 1 is a calculation without calling eval?

This checks if text in strings in between `{}`'s are variables.
The var the string is assigned to is also checked for a format() call.
If this does not happen, the string should probably be a f-string and a
message is emitted.
This closes pylint-dev#2507
@coveralls
Copy link

coveralls commented Aug 2, 2021

Pull Request Test Coverage Report for Build 1144158204

  • 50 of 50 (100.0%) changed or added relevant lines in 1 file are covered.
  • 12 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.06%) to 92.802%

Files with Coverage Reduction New Missed Lines %
pylint/extensions/typing.py 1 95.79%
pylint/pyreverse/utils.py 3 82.39%
pylint/config/option_manager_mixin.py 8 89.36%
Totals Coverage Status
Change from base Build 1142277141: 0.06%
Covered Lines: 13498
Relevant Lines: 14545

💛 - Coveralls

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this checker, much appreciated !

pylint/checkers/strings.py Show resolved Hide resolved
pylint/checkers/strings.py Outdated Show resolved Hide resolved
pylint/checkers/strings.py Outdated Show resolved Hide resolved
pylint/checkers/strings.py Outdated Show resolved Hide resolved
pylint/checkers/strings.py Outdated Show resolved Hide resolved
tests/functional/p/possible_f_string_as_string.py Outdated Show resolved Hide resolved
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component Checkers Related to a checker Work in progress labels Aug 3, 2021
@DanielNoord
Copy link
Collaborator Author

DanielNoord commented Aug 3, 2021

For future contributors stumbling across this PR.

We would like to add a possible-f-string-as-string checker which checks for normal strings that seem to be f-strings.
The following example show some basic test cases:

param = "string"
"This is a string" # good
"This is a {param}" # bad 
f"This is a {param}" # good

"This is a calculation: 1 + 1" # good
"This is a calculation: {1 + 1}" # bad
f"This is a calculation: {1 + 1}" # good

"This is a nice string" # good
"This is a {"nice" + param}" # bad
f"This is a {'nice' + param}" # good

The code in this PR is (almost) able to test the first three cases by looking for regex matches in between {} and seeing if the matches can be found in the local or global scope.
The second and third batch of tests is more problematic. We need a way to test whether the {} contain valid python code. The second batch is (fairly) easy, the third is problematic. We need to evaluate the code within the {} within the current global and local scope to see if any of the variables are in there (such as param in the example). We currently have no solution to do this without any considerable performance impact.

TLDR: How to evaluate the code in between {} within a normal string in the context of that string without too much performance impact?

@DanielNoord DanielNoord removed their assignment Aug 3, 2021
@Crissal1995
Copy link

Sorry to pop out from nowhere, but this PR got my interest.

Have you read the f-string PEP? There are two interesting sections, Code equivalence and Expression evaluation, that shows how the f-string is treated internally, what's its representation with format statements, and how is parsed with ast.

I think that expressions should be evaluated as done in the PEP, i.e. with ast.parse('(' + expression + ')', '<fstring>', 'eval'); because the mode is set to eval, only a single statement can be executed, so I think that should be ok for security reasons.

If accepted, this should address every problem related with valid and invalid expressions inside curly braces.

@DanielNoord
Copy link
Collaborator Author

@Crissal1995 This is indeed interesting, thanks for the tip!

I have not tested this, so please forgive me if I my questions are non-relevant, but:

  1. Can't a single statement also produce security issues? I have no real experience with this security and eval, but I am thinking of print(organization_secret) or using an already imported requests module to send sensitive information from an automatically run CI?
  2. How can we pass the context of the string to ast.parse('(' + expression + ')', '<fstring>', 'eval'). It seems as if we should still pass all lines up until the line with the string to the ast-parser to infer any local or global variables. For performance reasons, this is something we should (obviously) avoid.

@Crissal1995
Copy link

Crissal1995 commented Aug 9, 2021

  1. Can't a single statement also produce security issues? I have no real experience with this security and eval, but I am thinking of print(organization_secret) or using an already imported requests module to send sensitive information from an automatically run CI?

Excuse me, but I don't understand this question. If you run PyLint, you are supposed to be in your own execution environment, so you know your secrets and have no interest to print them, or am I missing something?

  1. How can we pass the context of the string to ast.parse('(' + expression + ')', '<fstring>', 'eval'). It seems as if we should still pass all lines up until the line with the string to the ast-parser to infer any local or global variables. For performance reasons, this is something we should (obviously) avoid.

This is indeed a good question; however, I think that should be as PyLint already works. It's not needed to execute all the code up until your expression; rather, your variable needs to be in the scope of local or global variables.
To explain me better:

  • if I execute ast.parse("(y +/ 2)", "<fstring>", "eval"), then I got a SyntaxError.
  • If I execute ast.parse("(y + 2)", "<fstring>", "eval"), however, I got a valid ast.Expression; then I need to check every leaf, and see if it's a ast.Name or not - like y is in this case.

When I got a Name, I can check its id - that is the string representation of its name - to see if it is found in the current scope. If not, raise an error; otherwise you're good to go; there is no need to execute all code before.

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

@DanielNoord
Copy link
Collaborator Author

Excuse me, but I don't understand this question. If you run PyLint, you are supposed to be in your own execution environment, so you know your secrets and have no interest to print them, or am I missing something?

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

  • if I execute ast.parse("(y +/ 2)", "<fstring>", "eval"), then I got a SyntaxError.
  • If I execute ast.parse("(y + 2)", "<fstring>", "eval"), however, I got a valid ast.Expression; then I need to check every leaf, and see if it's a ast.Name or not - like y is in this case.

This seems like the right way forward. If I can find the time I might start working on this implementation, but I am more than happy for somebody else to start working on this.

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

I think we should do this check, as Pylint tries to inform users of errors before running their code. Relying on errors on runtime is therefore not truly in the spirit of pylint.

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Aug 16, 2021

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

I was thinking the same. Let's say a public open source project use pylint on Github and has a release job. In it's CI in a job, someone change a string to "{print(os.environ('PYPI_TOKEN')}", if we eval print(os.environ('PYPI_TOKEN') without special care, the attacker can see the value of the secret in the CI log (maybe they would also needs to make a test fail so the output is printed, but it's not that sophisticated). Then they can immediately publish a compromised package. This would be really bad.

@Crissal1995
Copy link

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

With ast.parse you only compile valid Python code, parsing its tree, but not execute a single line of code.
Furthermore, in compiling phase, there is also a flag automatically set by ast that is ast.PyCF_ONLY_AST.

[ast.PyCF_ONLY_AST] Generates and returns an abstract syntax tree instead of returning a compiled code object.

In fact, when you eval a node generated by ast.parse, you get a nice TypeError.

>>> import ast
>>> f1 = lambda x: ast.parse("(" + x + ")", "<fstring>", "eval")
>>> f2 = lambda x: compile(x, "<fstring>", "eval")
>>> eval(f1("1"))
TypeError: eval() arg 1 must be a string, bytes or code object
>>> eval(f2("1"))
1
>>> x = "MY SECRET"
>>> y = "print(x)"
>>> eval(f1(y))
TypeError: eval() arg 1 must be a string, bytes or code object
>>> eval(f2(y))
MY SECRET

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

I think we should do this check, as Pylint tries to inform users of errors before running their code. Relying on errors on runtime is therefore not truly in the spirit of pylint.

But being Python a dynamically typed language, this turns out to be a difficult task, if not impossible.
The only possible scenario to see this happen is to force type hinting for every object passed in the context of a fstring, otherwise how can we know if myvar is an int, a Foo object or what else - and thus has in fact the z attribute?

For example, when you parse("(foo.x)"), you get back an Expression made out of an Attribute. Without executing the code up until that point, you cannot really tell if foo really has that attribute; that's because I suggest to let the execution fail, or at least warn the user that this particular string - "This is my var: {foo.x}" - is surely a missed fstring.

@Pierre-Sassoulas
Copy link
Member

I tried something with ast.parse, but I fail to find a test case where the returned parsed ast is not an expression. I think it create a syntax error the way it's parsed if it isn't one anyway, so we probably don't need to check that it's an ast.Expression. Other than that checking that a valid python expression contain a a variable from the global or local scope seems complicated in term of performance so we can probably not do it. What we probably need to do is to check that the string is not used with format later on because the number of false positive will be huge if we don't.

@DanielNoord
Copy link
Collaborator Author

DanielNoord commented Aug 17, 2021

I was also toying around with ast.parse but I failed to find an easy way to implement it with the expected behaviour. Perhaps we need to think about this the other way around: why would {...} be included in a normal string and see if we can test for those cases?

@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 2.10.0, 2.11.0 Aug 20, 2021
@Pierre-Sassoulas Pierre-Sassoulas changed the title Add possible-f-string-as-string checker Add `possible-forgotten-f-prefix checker Aug 20, 2021
@DanielNoord
Copy link
Collaborator Author

It's true that there is a lot of false positive, because it's hard to check if a string is used later. Maybe we can limit the warning for string that we know won't leave the scope (ie function or method variables). So if the string can be used later with format or % and are hard to check, then we never make any assumption?

I am not sure if we are already doing this, but having checks work for some code and not for others without clearly showing that the user is not something I am really in favour of. When I run the linter in VS Code I expect all messages I haven't disabled to be checked.

Maybe we can raise a warning but with a lower confidence level ? I'm not sure a lot of person are using the confidence level in their configuration.

I never did, but this might be an option.

@Pierre-Sassoulas
Copy link
Member

I am not sure if we are already doing this, but having checks work for some code and not for others without clearly showing that the user is not something I am really in favour of. When I run the linter in VS Code I expect all messages I haven't disabled to be checked.

I think this is a conversation about would we rather have false positive vs false negative and about probabilities here. Pylint is known to have annoying false positives so we're trying to lower their number (almost 75% of issue are about false positives). But no one complain about a false negative :) So as a maintainer I'd rather have less false positive. Now we can consider that creating a string for format is rather rare and they can be disabled (Like you did in pylint there are maybe 4 occurrence for 20 ksloc...) so false positive are ok in this case.

To go into more detail I think the probability of creating a format string is higher on a non local string (it's a string that is going to be reused) and imo most of the false positive will be on such string. That's why I wanted to exclude the string that are not used locally from the check. Dou you have an opinion @cdce8p ?

@cdce8p
Copy link
Member

cdce8p commented Aug 30, 2021

Dou you have an opinion @cdce8p ?

In general, I agree with @Pierre-Sassoulas here. We should do our best to prevent as many false-positives as possible. If they get too frequent, they turn users away just because they are annoying. false-negatives in contrast are just areas we can improve in the future.

Regarding the specifics of f-strings, I'm not sure there is a good way to determine if it should be one or not. Thus this will likely lead to more false-positives than helpful warnings. @Pierre-Sassoulas mentioned assignments and I agree: Those should definitely be excluded. It's common to see code like this:

TEMPLATE_STR = "Some string with a placeholder: {}"
print(TEMPLATE_STR.format("Hello World"))

In the end, I don't know if we should move forward with it. In case you still would like to, it might be better to create a new extension for it.

@cdce8p
Copy link
Member

cdce8p commented Aug 30, 2021

A new extension could also include the consider-using-f-string check. See my comment here #4796 (comment)

@cdce8p cdce8p changed the title Add `possible-forgotten-f-prefix checker Add possible-forgotten-f-prefix checker Aug 30, 2021
@Pierre-Sassoulas
Copy link
Member

I think possible-forgotten-f-prefix should be a default checker as it's asked by a lot of user : it's a very popular issue, one of the most popular in fact (and it's also a very popular question on stackoverflow that I can't find anymore). The forgotten f happen very often. But I realize now that consider-using-f-string is a little opinionated and could be an extension. This is the last issue that we have to put in 2.11. Maybe we can delay possible-forgotten-f-prefix in 2.12 and put consider-using-f-string in the RecommendationChecker then release 2.11 ?

@DanielNoord
Copy link
Collaborator Author

DanielNoord commented Aug 31, 2021

While I agree consider-using-f-string is opinionated I believe the fact that there are real performance benefits might warrant a place in the default checkers. However, obviously this is a decision for the project maintainers so we should move it.

For this PR to move forward the following tasks remain:

  • Implement a nodes.Const.kind that also picks up on the r prefix rather than only u (and possibly backport to Python<3.8). This should probably be done in astroid.
    I have created an issue: Add nodes.Const.kind for r prefix and Python < 3.8 astroid#1156
  • Rewrite tests to only include 1) non-assignments in global scope, 2) assignments in method or function scopes and 3) only exact matches with variables in local or global scope
  • Rewrite code to match new tests. See commit 5 for a 95% working example, here: 2259aaf

@Pierre-Sassoulas
Copy link
Member

The performance benefit convinced me, I'm for keeping consider-using-f-string the way it is. Should we suggest to use pyupgrade in the message ? This automate a lot of the boring/easy stuff when migrating to f-string and could reduce a lof of the frustration.

Thank you for opening the issue in astroid and making a checklist. It seems there is still a lot of work on this one so let's not block 2.11 because of it.

@Pierre-Sassoulas Pierre-Sassoulas removed this from the 2.11.0 milestone Aug 31, 2021
@cdce8p
Copy link
Member

cdce8p commented Aug 31, 2021

While I agree consider-using-f-string is opinionated I believe the fact that there are real performance benefits might warrant a place in the default checkers. However, obviously this is a decision for the project maintainers so we should move it.

Ok, this would indeed be an argument to keep it enabled by default. I would nevertheless recommend to move it to the RecommendationChecker. That would just be a better fit for it. (The one is loaded by default too.)

--

Should we suggest to use pyupgrade in the message ? This automate a lot of the boring/easy stuff when migrating to f-string and could reduce a lof of the frustration.

The documentation for pyupgrade mentions some good arguments for when it should not be used: https://github.com/asottile/pyupgrade#f-strings

note: pyupgrade is intentionally timid and will not create an f-string
if it would make the expression longer or if the substitution parameters
are anything but simple names or dotted names (as this can decrease readability).

I wasn't aware that pyupgrade has already implemented fstrings. This is a good argument, for moving it to an optional extension. If pyupgrade can do it automatically, we should recommend using that instead!

DanielNoord added a commit to DanielNoord/pylint that referenced this pull request Sep 3, 2021
Pierre-Sassoulas added a commit that referenced this pull request Sep 3, 2021
Based on discussion in #4787

Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>
@DanielNoord
Copy link
Collaborator Author

I'm going to close this PR (sadly).

The logic I worked on worked (sort of) but there were just too many false positives an edge cases that I would be comfortable merging it. I think pylint-dev/astroid#1156 would need to be fixed for me to ever start working on this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Checkers Related to a checker Enhancement ✨ Improvement to a component Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect when f-string-syntax is used in a string, but not marked as an f-string
5 participants