Add `possible-forgotten-f-prefix` checker #4787

DanielNoord · 2021-08-02T13:54:59Z

To ease the process of reviewing your PR, do make sure to complete the following boxes.

Add yourself to CONTRIBUTORS if you are a new contributor.
Add a ChangeLog entry describing what your PR does.
If it's a new feature, or an important bug fix, add a What's New entry in
doc/whatsnew/<current release.rst>.
Write a good description on what the PR does.

Type of Changes

	Type
✓	✨ New feature

Description

This checks if text in strings in between {}'s are variables.
The var the string is assigned to is also checked for a format() call.
If this does not happen, the string should probably be a f-string and a message is emitted.
This closes #2507

Reason for this PR being a draft

Ideally I would also add a check for the following code:

x = "This is a calculation {1 + 1}" # bad

x = f"This is a calculation {1 + 1}" # good

Can we check that 1 + 1 is a calculation without calling eval?

This checks if text in strings in between `{}`'s are variables. The var the string is assigned to is also checked for a format() call. If this does not happen, the string should probably be a f-string and a message is emitted. This closes pylint-dev#2507

coveralls · 2021-08-02T14:16:18Z

Pull Request Test Coverage Report for Build 1144158204

50 of 50 (100.0%) changed or added relevant lines in 1 file are covered.
12 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.06%) to 92.802%

Files with Coverage Reduction	New Missed Lines	%
pylint/extensions/typing.py	1	95.79%
pylint/pyreverse/utils.py	3	82.39%
pylint/config/option_manager_mixin.py	8	89.36%

Totals
Change from base Build 1142277141:	0.06%
Covered Lines:	13498
Relevant Lines:	14545

💛 - Coveralls

Pierre-Sassoulas

Thank you for working on this checker, much appreciated !

pylint/checkers/strings.py

tests/functional/p/possible_f_string_as_string.py

Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

for more information, see https://pre-commit.ci

DanielNoord · 2021-08-03T21:37:44Z

For future contributors stumbling across this PR.

We would like to add a possible-f-string-as-string checker which checks for normal strings that seem to be f-strings.
The following example show some basic test cases:

param = "string"
"This is a string" # good
"This is a {param}" # bad 
f"This is a {param}" # good

"This is a calculation: 1 + 1" # good
"This is a calculation: {1 + 1}" # bad
f"This is a calculation: {1 + 1}" # good

"This is a nice string" # good
"This is a {"nice" + param}" # bad
f"This is a {'nice' + param}" # good

The code in this PR is (almost) able to test the first three cases by looking for regex matches in between {} and seeing if the matches can be found in the local or global scope.
The second and third batch of tests is more problematic. We need a way to test whether the {} contain valid python code. The second batch is (fairly) easy, the third is problematic. We need to evaluate the code within the {} within the current global and local scope to see if any of the variables are in there (such as param in the example). We currently have no solution to do this without any considerable performance impact.

TLDR: How to evaluate the code in between {} within a normal string in the context of that string without too much performance impact?

Crissal1995 · 2021-08-06T04:36:06Z

Sorry to pop out from nowhere, but this PR got my interest.

Have you read the f-string PEP? There are two interesting sections, Code equivalence and Expression evaluation, that shows how the f-string is treated internally, what's its representation with format statements, and how is parsed with ast.

I think that expressions should be evaluated as done in the PEP, i.e. with ast.parse('(' + expression + ')', '<fstring>', 'eval'); because the mode is set to eval, only a single statement can be executed, so I think that should be ok for security reasons.

If accepted, this should address every problem related with valid and invalid expressions inside curly braces.

DanielNoord · 2021-08-06T07:51:21Z

@Crissal1995 This is indeed interesting, thanks for the tip!

I have not tested this, so please forgive me if I my questions are non-relevant, but:

Can't a single statement also produce security issues? I have no real experience with this security and eval, but I am thinking of print(organization_secret) or using an already imported requests module to send sensitive information from an automatically run CI?
How can we pass the context of the string to ast.parse('(' + expression + ')', '<fstring>', 'eval'). It seems as if we should still pass all lines up until the line with the string to the ast-parser to infer any local or global variables. For performance reasons, this is something we should (obviously) avoid.

Crissal1995 · 2021-08-09T07:56:55Z

Can't a single statement also produce security issues? I have no real experience with this security and eval, but I am thinking of print(organization_secret) or using an already imported requests module to send sensitive information from an automatically run CI?

Excuse me, but I don't understand this question. If you run PyLint, you are supposed to be in your own execution environment, so you know your secrets and have no interest to print them, or am I missing something?

How can we pass the context of the string to ast.parse('(' + expression + ')', '<fstring>', 'eval'). It seems as if we should still pass all lines up until the line with the string to the ast-parser to infer any local or global variables. For performance reasons, this is something we should (obviously) avoid.

This is indeed a good question; however, I think that should be as PyLint already works. It's not needed to execute all the code up until your expression; rather, your variable needs to be in the scope of local or global variables.
To explain me better:

if I execute ast.parse("(y +/ 2)", "<fstring>", "eval"), then I got a SyntaxError.
If I execute ast.parse("(y + 2)", "<fstring>", "eval"), however, I got a valid ast.Expression; then I need to check every leaf, and see if it's a ast.Name or not - like y is in this case.

When I got a Name, I can check its id - that is the string representation of its name - to see if it is found in the current scope. If not, raise an error; otherwise you're good to go; there is no need to execute all code before.

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

DanielNoord · 2021-08-16T10:02:35Z

Excuse me, but I don't understand this question. If you run PyLint, you are supposed to be in your own execution environment, so you know your secrets and have no interest to print them, or am I missing something?

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

if I execute ast.parse("(y +/ 2)", "<fstring>", "eval"), then I got a SyntaxError.

If I execute ast.parse("(y + 2)", "<fstring>", "eval"), however, I got a valid ast.Expression; then I need to check every leaf, and see if it's a ast.Name or not - like y is in this case.

This seems like the right way forward. If I can find the time I might start working on this implementation, but I am more than happy for somebody else to start working on this.

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

I think we should do this check, as Pylint tries to inform users of errors before running their code. Relying on errors on runtime is therefore not truly in the spirit of pylint.

Pierre-Sassoulas · 2021-08-16T16:16:36Z

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

I was thinking the same. Let's say a public open source project use pylint on Github and has a release job. In it's CI in a job, someone change a string to "{print(os.environ('PYPI_TOKEN')}", if we eval print(os.environ('PYPI_TOKEN') without special care, the attacker can see the value of the secret in the CI log (maybe they would also needs to make a test fail so the output is printed, but it's not that sophisticated). Then they can immediately publish a compromised package. This would be really bad.

Crissal1995 · 2021-08-17T04:40:05Z

As I said, I'm not quite sure how CI's and secrets behave but I am imagining a situation where an automatically run CI prints or sends an organization's secrets as a result of evaluating f-strings. I'm not sure if the is actually a feasible line of attack, but we should be 100% certain of this before merging such behaviour into the tool.

With ast.parse you only compile valid Python code, parsing its tree, but not execute a single line of code.
Furthermore, in compiling phase, there is also a flag automatically set by ast that is ast.PyCF_ONLY_AST.

[ast.PyCF_ONLY_AST] Generates and returns an abstract syntax tree instead of returning a compiled code object.

In fact, when you eval a node generated by ast.parse, you get a nice TypeError.

>>> import ast
>>> f1 = lambda x: ast.parse("(" + x + ")", "<fstring>", "eval")
>>> f2 = lambda x: compile(x, "<fstring>", "eval")
>>> eval(f1("1"))
TypeError: eval() arg 1 must be a string, bytes or code object
>>> eval(f2("1"))
1
>>> x = "MY SECRET"
>>> y = "print(x)"
>>> eval(f1(y))
TypeError: eval() arg 1 must be a string, bytes or code object
>>> eval(f2(y))
MY SECRET

If I try to access an object variable, like myvar.z, there is just a need of checking if myvar is present, not that is present and has the attribute z; this will be done at runtime from Python itself.

I think we should do this check, as Pylint tries to inform users of errors before running their code. Relying on errors on runtime is therefore not truly in the spirit of pylint.

But being Python a dynamically typed language, this turns out to be a difficult task, if not impossible.
The only possible scenario to see this happen is to force type hinting for every object passed in the context of a fstring, otherwise how can we know if myvar is an int, a Foo object or what else - and thus has in fact the z attribute?

For example, when you parse("(foo.x)"), you get back an Expression made out of an Attribute. Without executing the code up until that point, you cannot really tell if foo really has that attribute; that's because I suggest to let the execution fail, or at least warn the user that this particular string - "This is my var: {foo.x}" - is surely a missed fstring.

Pierre-Sassoulas · 2021-08-17T08:21:50Z

I tried something with ast.parse, but I fail to find a test case where the returned parsed ast is not an expression. I think it create a syntax error the way it's parsed if it isn't one anyway, so we probably don't need to check that it's an ast.Expression. Other than that checking that a valid python expression contain a a variable from the global or local scope seems complicated in term of performance so we can probably not do it. What we probably need to do is to check that the string is not used with format later on because the number of false positive will be huge if we don't.

DanielNoord · 2021-08-17T08:30:23Z

I was also toying around with ast.parse but I failed to find an easy way to implement it with the expected behaviour. Perhaps we need to think about this the other way around: why would {...} be included in a normal string and see if we can test for those cases?

DanielNoord · 2021-08-28T10:08:43Z

It's true that there is a lot of false positive, because it's hard to check if a string is used later. Maybe we can limit the warning for string that we know won't leave the scope (ie function or method variables). So if the string can be used later with format or % and are hard to check, then we never make any assumption?

I am not sure if we are already doing this, but having checks work for some code and not for others without clearly showing that the user is not something I am really in favour of. When I run the linter in VS Code I expect all messages I haven't disabled to be checked.

Maybe we can raise a warning but with a lower confidence level ? I'm not sure a lot of person are using the confidence level in their configuration.

I never did, but this might be an option.

Pierre-Sassoulas · 2021-08-30T07:51:24Z

I am not sure if we are already doing this, but having checks work for some code and not for others without clearly showing that the user is not something I am really in favour of. When I run the linter in VS Code I expect all messages I haven't disabled to be checked.

I think this is a conversation about would we rather have false positive vs false negative and about probabilities here. Pylint is known to have annoying false positives so we're trying to lower their number (almost 75% of issue are about false positives). But no one complain about a false negative :) So as a maintainer I'd rather have less false positive. Now we can consider that creating a string for format is rather rare and they can be disabled (Like you did in pylint there are maybe 4 occurrence for 20 ksloc...) so false positive are ok in this case.

To go into more detail I think the probability of creating a format string is higher on a non local string (it's a string that is going to be reused) and imo most of the false positive will be on such string. That's why I wanted to exclude the string that are not used locally from the check. Dou you have an opinion @cdce8p ?

cdce8p · 2021-08-30T14:56:23Z

Dou you have an opinion @cdce8p ?

In general, I agree with @Pierre-Sassoulas here. We should do our best to prevent as many false-positives as possible. If they get too frequent, they turn users away just because they are annoying. false-negatives in contrast are just areas we can improve in the future.

Regarding the specifics of f-strings, I'm not sure there is a good way to determine if it should be one or not. Thus this will likely lead to more false-positives than helpful warnings. @Pierre-Sassoulas mentioned assignments and I agree: Those should definitely be excluded. It's common to see code like this:

TEMPLATE_STR = "Some string with a placeholder: {}"
print(TEMPLATE_STR.format("Hello World"))

In the end, I don't know if we should move forward with it. In case you still would like to, it might be better to create a new extension for it.

cdce8p · 2021-08-30T15:13:27Z

A new extension could also include the consider-using-f-string check. See my comment here #4796 (comment)

Pierre-Sassoulas · 2021-08-31T07:15:44Z

I think possible-forgotten-f-prefix should be a default checker as it's asked by a lot of user : it's a very popular issue, one of the most popular in fact (and it's also a very popular question on stackoverflow that I can't find anymore). The forgotten f happen very often. But I realize now that consider-using-f-string is a little opinionated and could be an extension. This is the last issue that we have to put in 2.11. Maybe we can delay possible-forgotten-f-prefix in 2.12 and put consider-using-f-string in the RecommendationChecker then release 2.11 ?

DanielNoord · 2021-08-31T09:07:06Z

While I agree consider-using-f-string is opinionated I believe the fact that there are real performance benefits might warrant a place in the default checkers. However, obviously this is a decision for the project maintainers so we should move it.

For this PR to move forward the following tasks remain:

Implement a nodes.Const.kind that also picks up on the r prefix rather than only u (and possibly backport to Python<3.8). This should probably be done in astroid.
I have created an issue: Add nodes.Const.kind for r prefix and Python < 3.8 astroid#1156
Rewrite tests to only include 1) non-assignments in global scope, 2) assignments in method or function scopes and 3) only exact matches with variables in local or global scope
Rewrite code to match new tests. See commit 5 for a 95% working example, here: 2259aaf

Pierre-Sassoulas · 2021-08-31T10:16:01Z

The performance benefit convinced me, I'm for keeping consider-using-f-string the way it is. Should we suggest to use pyupgrade in the message ? This automate a lot of the boring/easy stuff when migrating to f-string and could reduce a lof of the frustration.

Thank you for opening the issue in astroid and making a checklist. It seems there is still a lot of work on this one so let's not block 2.11 because of it.

cdce8p · 2021-08-31T12:12:04Z

While I agree consider-using-f-string is opinionated I believe the fact that there are real performance benefits might warrant a place in the default checkers. However, obviously this is a decision for the project maintainers so we should move it.

Ok, this would indeed be an argument to keep it enabled by default. I would nevertheless recommend to move it to the RecommendationChecker. That would just be a better fit for it. (The one is loaded by default too.)

--

Should we suggest to use pyupgrade in the message ? This automate a lot of the boring/easy stuff when migrating to f-string and could reduce a lof of the frustration.

The documentation for pyupgrade mentions some good arguments for when it should not be used: https://github.com/asottile/pyupgrade#f-strings

note: pyupgrade is intentionally timid and will not create an f-string
if it would make the expression longer or if the substitution parameters
are anything but simple names or dotted names (as this can decrease readability).

I wasn't aware that pyupgrade has already implemented fstrings. This is a good argument, for moving it to an optional extension. If pyupgrade can do it automatically, we should recommend using that instead!

Based on discussion in pylint-dev#4787

Based on discussion in #4787 Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

DanielNoord · 2022-04-01T17:50:09Z

I'm going to close this PR (sadly).

The logic I worked on worked (sort of) but there were just too many false positives an edge cases that I would be comfortable merging it. I think pylint-dev/astroid#1156 would need to be fixed for me to ever start working on this again.

from pylint-dev#4787

DanielNoord marked this pull request as draft August 2, 2021 13:55

DanielNoord mentioned this pull request Aug 2, 2021

Add check for redundant u prefixed string in Python 3 #4102

Closed

Pierre-Sassoulas assigned DanielNoord Aug 2, 2021

Pierre-Sassoulas self-requested a review August 2, 2021 14:05

Remove walrus operator

92b8959

Pierre-Sassoulas reviewed Aug 2, 2021

View reviewed changes

DanielNoord and others added 4 commits August 2, 2021 19:02

Apply suggestions from code review

240b4da

Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3c2cf24

for more information, see https://pre-commit.ci

Add review changes

f7f4f89

Merge branch 'main' into f-strings-variable

2259aaf

Pierre-Sassoulas added Enhancement ✨ Improvement to a component Checkers Related to a checker Work in progress labels Aug 3, 2021

DanielNoord removed their assignment Aug 3, 2021

Pierre-Sassoulas added 3 commits August 17, 2021 08:17

Merge branch 'main' into f-strings-variable

a75c186

Fix bad conflict resolution

c91a14f

Add more test cases with invalid python code

2060037

Use ast.parse() work in progress

8def6c5

Pierre-Sassoulas force-pushed the f-strings-variable branch from 851fea0 to 8def6c5 Compare August 17, 2021 08:23

Pierre-Sassoulas modified the milestones: 2.10.0, 2.11.0 Aug 20, 2021

Pierre-Sassoulas changed the title ~~Add possible-f-string-as-string checker~~ Add `possible-forgotten-f-prefix checker Aug 20, 2021

cdce8p changed the title ~~Add `possible-forgotten-f-prefix checker~~ Add possible-forgotten-f-prefix checker Aug 30, 2021

DanielNoord mentioned this pull request Aug 31, 2021

Add nodes.Const.kind for r prefix and Python < 3.8 pylint-dev/astroid#1156

Open

Pierre-Sassoulas removed this from the 2.11.0 milestone Aug 31, 2021

DanielNoord added a commit to DanielNoord/pylint that referenced this pull request Sep 3, 2021

Moved consider-using-f-string to RecommendationChecker

d7211cc

Based on discussion in pylint-dev#4787

DanielNoord mentioned this pull request Sep 3, 2021

Moved consider-using-f-string to RecommendationChecker #4957

Merged

2 tasks

Pierre-Sassoulas added a commit that referenced this pull request Sep 3, 2021

Moved consider-using-f-string to RecommendationChecker (#4957)

6c81831

Based on discussion in #4787 Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

cdce8p mentioned this pull request Sep 17, 2021

False positive consider-using-f-string for localization string #5039

Closed

grahamgower mentioned this pull request Dec 6, 2021

Fix some f-strings; additional linting. popsim-consortium/demes-python#396

Merged

DanielNoord added the Work in progress label Dec 14, 2021

Pierre-Sassoulas self-requested a review January 16, 2022 21:07

DanielNoord closed this Apr 1, 2022

DanielNoord deleted the f-strings-variable branch April 1, 2022 17:50

Pierre-Sassoulas mentioned this pull request Jul 2, 2022

Detect when f-string-syntax is used in a string, but not marked as an f-string #2507

Open

zzl0 mentioned this pull request Aug 6, 2022

Fix parse error message for number parsing Instagram/LibCST#724

Merged

lparsons mentioned this pull request Dec 12, 2023

Eliminate export exceptions due to no data Princeton-LSI-ResearchComputing/tracebase#810

Merged

8 tasks

ilius added a commit to ilius/pylint that referenced this pull request Feb 20, 2024

add W1311: possible-forgotten-f-prefix

63bb27d

from pylint-dev#4787

ilius added a commit to ilius/pylint that referenced this pull request Feb 20, 2024

add W1311: possible-forgotten-f-prefix

bff9485

from pylint-dev#4787

ilius added a commit to ilius/pylint that referenced this pull request Feb 20, 2024

add W1311: possible-forgotten-f-prefix

12bdc45

from pylint-dev#4787

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `possible-forgotten-f-prefix` checker #4787

Add `possible-forgotten-f-prefix` checker #4787

DanielNoord commented Aug 2, 2021

coveralls commented Aug 2, 2021 •

edited

Pierre-Sassoulas left a comment

DanielNoord commented Aug 3, 2021 •

edited by Pierre-Sassoulas

Crissal1995 commented Aug 6, 2021

DanielNoord commented Aug 6, 2021

Crissal1995 commented Aug 9, 2021 •

edited

DanielNoord commented Aug 16, 2021

Pierre-Sassoulas commented Aug 16, 2021 •

edited

Crissal1995 commented Aug 17, 2021

Pierre-Sassoulas commented Aug 17, 2021

DanielNoord commented Aug 17, 2021 •

edited

DanielNoord commented Aug 28, 2021

Pierre-Sassoulas commented Aug 30, 2021

cdce8p commented Aug 30, 2021

cdce8p commented Aug 30, 2021

Pierre-Sassoulas commented Aug 31, 2021

DanielNoord commented Aug 31, 2021 •

edited

Pierre-Sassoulas commented Aug 31, 2021

cdce8p commented Aug 31, 2021

DanielNoord commented Apr 1, 2022

Add possible-forgotten-f-prefix checker #4787

Add possible-forgotten-f-prefix checker #4787

Conversation

DanielNoord commented Aug 2, 2021

Type of Changes

Description

Reason for this PR being a draft

coveralls commented Aug 2, 2021 • edited

Pull Request Test Coverage Report for Build 1144158204

💛 - Coveralls

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

DanielNoord commented Aug 3, 2021 • edited by Pierre-Sassoulas

Crissal1995 commented Aug 6, 2021

DanielNoord commented Aug 6, 2021

Crissal1995 commented Aug 9, 2021 • edited

DanielNoord commented Aug 16, 2021

Pierre-Sassoulas commented Aug 16, 2021 • edited

Crissal1995 commented Aug 17, 2021

Pierre-Sassoulas commented Aug 17, 2021

DanielNoord commented Aug 17, 2021 • edited

DanielNoord commented Aug 28, 2021

Pierre-Sassoulas commented Aug 30, 2021

cdce8p commented Aug 30, 2021

cdce8p commented Aug 30, 2021

Pierre-Sassoulas commented Aug 31, 2021

DanielNoord commented Aug 31, 2021 • edited

Pierre-Sassoulas commented Aug 31, 2021

cdce8p commented Aug 31, 2021

DanielNoord commented Apr 1, 2022

Add `possible-forgotten-f-prefix` checker #4787

Add `possible-forgotten-f-prefix` checker #4787

coveralls commented Aug 2, 2021 •

edited

DanielNoord commented Aug 3, 2021 •

edited by Pierre-Sassoulas

Crissal1995 commented Aug 9, 2021 •

edited

Pierre-Sassoulas commented Aug 16, 2021 •

edited

DanielNoord commented Aug 17, 2021 •

edited

DanielNoord commented Aug 31, 2021 •

edited