Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible false positive with use-sequence-for-iteration on in-place sets with dynamic content? #5788

Closed
Lyle-Alloy opened this issue Feb 10, 2022 · 9 comments · Fixed by #7975
Assignees
Labels
False Positive 🦟 A message is emitted but nothing is wrong with the code Needs PR This issue is accepted, sufficiently specified and now needs an implementation
Milestone

Comments

@Lyle-Alloy
Copy link

Lyle-Alloy commented Feb 10, 2022

Bug description

I'm getting a warning for use-sequence-for-iteration in my code, but I believe I have a valid reason to use it that should possibly be considered as an exception for the rule.

In my code I iterate over in-place sets with some in-place content, and some dynamic content to ensure that I iterate over the in-place content if it doesn't already exist in the dynamic content, while also ensuring I never iterate over the same thing twice.

Example:

# test.py

def print_strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    for string in {*may_contain_hello, "hello"}:
        print(string)


print_strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
print_strings_and_hello(["goodbye"])  # prints both hello and goodbye

Do let me know if you think this is a bad practice!

I can't find much discussion about this warning online (which makes sense considering it was added recently) so I would love to engage and leave a paper trail for the next user who gets this warning for a similar usage.

Command used

pylint test.py

Pylint output

************* Module test
test.py:8: [C0208(use-sequence-for-iteration), print_strings_and_hello] Use a sequence type when iterating over values

------------------------------------------------------------------
Your code has been rated at 8.00/10 (previous run: 8.00/10, +0.00)

Expected behavior

No warning for the line

Pylint version

pylint 2.12.2
astroid 2.9.0
Python 3.9.2 (default, Apr 15 2021, 10:01:45) 
[Clang 12.0.0 (clang-1200.0.32.29)]
@Lyle-Alloy Lyle-Alloy added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Feb 10, 2022
@Pierre-Sassoulas Pierre-Sassoulas added Discussion 🤔 Question and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Feb 10, 2022
@Pierre-Sassoulas
Copy link
Member

I think you need a set but you're not making the check that hello is in your iterable explicit. Doing so would make your code faster. I took the liberty of removing the print from your example so the benchmark is not majorly measuring the time it takes to print to terminal:

Here's the original a.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    for string in {*may_contain_hello, "hello"}:
        r += string
    return r


iterations = 10000000
for i in range(iterations):
    strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
    strings_and_hello(["goodbye"])  # prints both hello and goodbye
time python3 a.py
python3 a.py  5,91s user 0,01s system 95% cpu 6,170 total

If you can replace your iterable by set in b.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    if "hello" not in may_contain_hello:
        r += "hello"
    for string in may_contain_hello:
        r += string
    return r


iterations = 10000000
for i in range(iterations):
    strings_and_hello({"hello", "goodbye"})  # prints both hello and goodbye
    strings_and_hello({"goodbye"})  # prints both hello and goodbye
time python3 b.py
python3 b.py  3,63s user 0,01s system 97% cpu 3,737 total

Now that isn't really a fair comparison. Maybe you can't change the input. So using the following file c.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    may_contain_hello = set(may_contain_hello)
    if "hello" not in may_contain_hello:
        r += "hello"
    for string in may_contain_hello:
        r += string
    return r

iterations = 10000000
for i in range(iterations):
    strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
    strings_and_hello(["goodbye"])  # prints both hello and goodbye

We still have a small perf increase compared to a.py:

time python3 c.py
python3 c.py  5,64s user 0,01s system 97% cpu 5,773 total

Which is why I think the warning is not a false positive.

@jacobtylerwalls
Copy link
Member

I think it is a false positive.

It looks like that case was explicitly contemplated:
https://github.com/PyCQA/pylint/blob/61b4db0dc31fd31fcfda36e4730cb44848af2d30/tests/functional/u/use/use_sequence_for_iteration.py#L16

But on the other hand, there was discussion that this warning was only intended for in-place sets. This isn't really an "in place" set anymore if it's deduplicating an input.

I read the discussion on #4776, and I think this check was mostly about the nondeterministic ordering you get from iterating a set. Not about performance, really. But for performance, take a look at the most atomic case:

$ cat a.py
def deduplicate(list_in):
  for el in {*list_in}:
    print(el)

$ pylint a.py --disable=all --enable=use-sequence-for-iteration
************* Module a
a.py:2:12: C0208: Use a sequence type when iterating over values (use-sequence-for-iteration)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 0.00/10, +6.67)

User gets that and says, "okay pylint I trust you, let me cast to list after de-duping":

>>> timeit('[x for x in var]', setup='var={1, 2, 3}')  # pretend var is dynamic
0.3215374090068508
>>> timeit('[x for x in list(var)]', setup='var={1, 2, 3}')
0.46824162200209685

Note that if the user does ⬆️ , then they will fall afoul of the checker proposed in #4355.

@jacobtylerwalls
Copy link
Member

I think you need a set but you're not making the check that hello is in your iterable explicit. Doing so would make your code faster.

Unfortunately this won't account for a use case like:

def deduplicate_two_lists(input1, input2):
    for el in {*input1, *input2}:
        do_something()

@Pierre-Sassoulas
Copy link
Member

Great point @jacobtylerwalls, as we're raising a message on something that is not an in-place set/list the change required are a refactor (like the one I did above), not simply casting to set/list (as evidenced by your benchmark, it just add more thing to do before iterating). I think we need to stick to in-place set/list or make it clearer that a simple cast is not the solution.

@Pierre-Sassoulas Pierre-Sassoulas added Documentation 📗 False Positive 🦟 A message is emitted but nothing is wrong with the code and removed Question labels Feb 10, 2022
@Pierre-Sassoulas Pierre-Sassoulas added this to Todo in Better error messages via automation Jul 6, 2022
@Pierre-Sassoulas Pierre-Sassoulas added Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Discussion 🤔 labels Jul 6, 2022
@clavedeluna
Copy link
Collaborator

@Pierre-Sassoulas is the task for this issue just updating the documentation or actually updating use-sequence-for-iteration to only check for in-place sets?

@Pierre-Sassoulas
Copy link
Member

I think we need to also change the behavior to target only inlined set.

@jacobtylerwalls
Copy link
Member

Perhaps by ignoring sets that have an unpacking with *?

@clavedeluna
Copy link
Collaborator

seems reasonable that the existing functional test [x for x in {*var, 4}] not raise the message.

@jacobtylerwalls
Copy link
Member

Yeah I came to that same realization in #5788 (comment). 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
False Positive 🦟 A message is emitted but nothing is wrong with the code Needs PR This issue is accepted, sufficiently specified and now needs an implementation
Projects
Development

Successfully merging a pull request may close this issue.

4 participants