Possible false positive with `use-sequence-for-iteration` on in-place sets with dynamic content? #5788

Lyle-Alloy · 2022-02-10T17:17:27Z

Bug description

I'm getting a warning for use-sequence-for-iteration in my code, but I believe I have a valid reason to use it that should possibly be considered as an exception for the rule.

In my code I iterate over in-place sets with some in-place content, and some dynamic content to ensure that I iterate over the in-place content if it doesn't already exist in the dynamic content, while also ensuring I never iterate over the same thing twice.

Example:

# test.py

def print_strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    for string in {*may_contain_hello, "hello"}:
        print(string)


print_strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
print_strings_and_hello(["goodbye"])  # prints both hello and goodbye

Do let me know if you think this is a bad practice!

I can't find much discussion about this warning online (which makes sense considering it was added recently) so I would love to engage and leave a paper trail for the next user who gets this warning for a similar usage.

Command used

pylint test.py

Pylint output

************* Module test
test.py:8: [C0208(use-sequence-for-iteration), print_strings_and_hello] Use a sequence type when iterating over values

------------------------------------------------------------------
Your code has been rated at 8.00/10 (previous run: 8.00/10, +0.00)

Expected behavior

No warning for the line

Pylint version

pylint 2.12.2
astroid 2.9.0
Python 3.9.2 (default, Apr 15 2021, 10:01:45) 
[Clang 12.0.0 (clang-1200.0.32.29)]

The text was updated successfully, but these errors were encountered:

Pierre-Sassoulas · 2022-02-10T21:49:15Z

I think you need a set but you're not making the check that hello is in your iterable explicit. Doing so would make your code faster. I took the liberty of removing the print from your example so the benchmark is not majorly measuring the time it takes to print to terminal:

Here's the original a.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    for string in {*may_contain_hello, "hello"}:
        r += string
    return r


iterations = 10000000
for i in range(iterations):
    strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
    strings_and_hello(["goodbye"])  # prints both hello and goodbye

time python3 a.py
python3 a.py  5,91s user 0,01s system 95% cpu 6,170 total

If you can replace your iterable by set in b.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    if "hello" not in may_contain_hello:
        r += "hello"
    for string in may_contain_hello:
        r += string
    return r


iterations = 10000000
for i in range(iterations):
    strings_and_hello({"hello", "goodbye"})  # prints both hello and goodbye
    strings_and_hello({"goodbye"})  # prints both hello and goodbye

time python3 b.py
python3 b.py  3,63s user 0,01s system 97% cpu 3,737 total

Now that isn't really a fair comparison. Maybe you can't change the input. So using the following file c.py:

def strings_and_hello(may_contain_hello):
    """
    Prints a set of strings without duplicates, while also ensuring 'hello' always gets
    printed
    """
    r = ""
    may_contain_hello = set(may_contain_hello)
    if "hello" not in may_contain_hello:
        r += "hello"
    for string in may_contain_hello:
        r += string
    return r

iterations = 10000000
for i in range(iterations):
    strings_and_hello(["hello", "goodbye"])  # prints both hello and goodbye
    strings_and_hello(["goodbye"])  # prints both hello and goodbye

We still have a small perf increase compared to a.py:

time python3 c.py
python3 c.py  5,64s user 0,01s system 97% cpu 5,773 total

Which is why I think the warning is not a false positive.

jacobtylerwalls · 2022-02-10T21:55:58Z

I think it is a false positive.

It looks like that case was explicitly contemplated:
https://github.com/PyCQA/pylint/blob/61b4db0dc31fd31fcfda36e4730cb44848af2d30/tests/functional/u/use/use_sequence_for_iteration.py#L16

But on the other hand, there was discussion that this warning was only intended for in-place sets. This isn't really an "in place" set anymore if it's deduplicating an input.

I read the discussion on #4776, and I think this check was mostly about the nondeterministic ordering you get from iterating a set. Not about performance, really. But for performance, take a look at the most atomic case:

$ cat a.py
def deduplicate(list_in):
  for el in {*list_in}:
    print(el)

$ pylint a.py --disable=all --enable=use-sequence-for-iteration
************* Module a
a.py:2:12: C0208: Use a sequence type when iterating over values (use-sequence-for-iteration)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 0.00/10, +6.67)

User gets that and says, "okay pylint I trust you, let me cast to list after de-duping":

>>> timeit('[x for x in var]', setup='var={1, 2, 3}')  # pretend var is dynamic
0.3215374090068508
>>> timeit('[x for x in list(var)]', setup='var={1, 2, 3}')
0.46824162200209685

Note that if the user does ⬆️ , then they will fall afoul of the checker proposed in #4355.

jacobtylerwalls · 2022-02-10T22:06:20Z

I think you need a set but you're not making the check that hello is in your iterable explicit. Doing so would make your code faster.

Unfortunately this won't account for a use case like:

def deduplicate_two_lists(input1, input2):
    for el in {*input1, *input2}:
        do_something()

Pierre-Sassoulas · 2022-02-10T22:07:40Z

Great point @jacobtylerwalls, as we're raising a message on something that is not an in-place set/list the change required are a refactor (like the one I did above), not simply casting to set/list (as evidenced by your benchmark, it just add more thing to do before iterating). I think we need to stick to in-place set/list or make it clearer that a simple cast is not the solution.

clavedeluna · 2022-12-23T13:12:25Z

@Pierre-Sassoulas is the task for this issue just updating the documentation or actually updating use-sequence-for-iteration to only check for in-place sets?

Pierre-Sassoulas · 2022-12-23T13:16:28Z

I think we need to also change the behavior to target only inlined set.

jacobtylerwalls · 2022-12-23T13:18:38Z

Perhaps by ignoring sets that have an unpacking with *?

clavedeluna · 2022-12-23T13:27:59Z

seems reasonable that the existing functional test [x for x in {*var, 4}] not raise the message.

jacobtylerwalls · 2022-12-23T13:48:37Z

Yeah I came to that same realization in #5788 (comment). 👍🏻

Closes pylint-dev#5788

Lyle-Alloy added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Feb 10, 2022

Pierre-Sassoulas added Discussion 🤔 Question and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Feb 10, 2022

Pierre-Sassoulas added Documentation 📗 False Positive 🦟 A message is emitted but nothing is wrong with the code and removed Question labels Feb 10, 2022

DanielNoord mentioned this issue Jun 22, 2022

Create Needs PR and Needs decision labels #6999

Closed

Pierre-Sassoulas added this to Todo in Better error messages via automation Jul 6, 2022

Pierre-Sassoulas added Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Discussion 🤔 labels Jul 6, 2022

clavedeluna self-assigned this Dec 23, 2022

clavedeluna removed the Documentation 📗 label Dec 23, 2022

clavedeluna added a commit to clavedeluna/pylint that referenced this issue Dec 23, 2022

Fix use-sequence-for-iteration when unpacking a set with *.

be7bda6

Closes pylint-dev#5788

clavedeluna mentioned this issue Dec 23, 2022

Fix use-sequence-for-iteration when unpacking a set with * #7975

Merged

jacobtylerwalls closed this as completed in #7975 Dec 26, 2022

Better error messages automation moved this from Todo to Done Dec 26, 2022

Pierre-Sassoulas added this to the 2.15.10 milestone Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible false positive with `use-sequence-for-iteration` on in-place sets with dynamic content? #5788

Possible false positive with `use-sequence-for-iteration` on in-place sets with dynamic content? #5788

Lyle-Alloy commented Feb 10, 2022 •

edited by Pierre-Sassoulas

Pierre-Sassoulas commented Feb 10, 2022

jacobtylerwalls commented Feb 10, 2022

jacobtylerwalls commented Feb 10, 2022

Pierre-Sassoulas commented Feb 10, 2022

clavedeluna commented Dec 23, 2022

Pierre-Sassoulas commented Dec 23, 2022

jacobtylerwalls commented Dec 23, 2022

clavedeluna commented Dec 23, 2022

jacobtylerwalls commented Dec 23, 2022

Possible false positive with use-sequence-for-iteration on in-place sets with dynamic content? #5788

Possible false positive with use-sequence-for-iteration on in-place sets with dynamic content? #5788

Comments

Lyle-Alloy commented Feb 10, 2022 • edited by Pierre-Sassoulas

Bug description

Command used

Pylint output

Expected behavior

Pylint version

Pierre-Sassoulas commented Feb 10, 2022

jacobtylerwalls commented Feb 10, 2022

jacobtylerwalls commented Feb 10, 2022

Pierre-Sassoulas commented Feb 10, 2022

clavedeluna commented Dec 23, 2022

Pierre-Sassoulas commented Dec 23, 2022

jacobtylerwalls commented Dec 23, 2022

clavedeluna commented Dec 23, 2022

jacobtylerwalls commented Dec 23, 2022

Possible false positive with `use-sequence-for-iteration` on in-place sets with dynamic content? #5788

Possible false positive with `use-sequence-for-iteration` on in-place sets with dynamic content? #5788

Lyle-Alloy commented Feb 10, 2022 •

edited by Pierre-Sassoulas