False positive `consider-using-tuple` for `x in set`, because frozen sets are faster #4776

samueltardieu · 2021-07-30T07:23:56Z

Bug description

The change introduced in #4768 is unwarranted.

Considering the following code:

def f(a):
    if a in {"abc", "def", "ghi", "klm", "nop"}:
        print(a)

def g():
    for a in {"abc", "def", "ghi", "klm", "nop"}:
        print(a)

it suggests to use a tuple in both cases:

$ pylint /tmp/t.py -v
************* Module t
/tmp/t.py:2:12: R6102: Consider using an in-place tuple instead of set (consider-using-tuple)
/tmp/t.py:6:13: R6102: Consider using an in-place tuple instead of set (consider-using-tuple)

While the second suggestion is right (in-place sets should not be used for iterating over their content), the first one (added in #4768) is a very bad advice. In the case of a membership lookup a set is faster, compiled as a frozenset by Python, and the right tool for the job.

I would suggest that this commit be reverted before the release as it encourages bad practices. A new style check could rather suggest to replace in-place lists and tuples in membership tests by sets when the data type is known to be hashable (as strings or numbers are for example).

Configuration

No response

Command used

pylint /tmp/t.py

Pylint output

************* Module t
/tmp/t.py:2:12: R6102: Consider using an in-place tuple instead of set (consider-using-tuple)
/tmp/t.py:6:13: R6102: Consider using an in-place tuple instead of set (consider-using-tuple)

Expected behavior

I would expect to see the second suggestion (which is sound), and not the first one (which is a bad advice).

Pylint version

pylint 2.10.0-dev0 (5e5f48d05fb6f1322a7420d6ea7586966b08f80a)

OS / Environment

ArchLinux

Additional dependencies

No response

The text was updated successfully, but these errors were encountered:

Pierre-Sassoulas · 2021-07-30T07:40:07Z

Hey thank you for opening the issue. Do you have a source for the performance comparison between the two ?

samueltardieu · 2021-07-30T07:52:43Z

>>> def in_set(a): return a in {"abc", "def", "ghi", "klm", "nop"}
...
>>> def in_tuple(a): return a in ("abc", "def", "ghi", "klm", "nop")
...
>>> from timeit import timeit
>>> timeit(lambda: in_set("foobar"))
0.08363517999532633
>>> timeit(lambda: in_tuple("foobar"))
0.1204142420028802

Using tuples (or lists for that matter) instead of sets require that comparisons are done sequentially until the member is found or the tuple. Also, using tuples or lists requires placing the expected-most-often-found elements first as they will be compared first, while in a set a hash comparison will find the matching element (or the absence of the key) in few operations.

As is documented in Python source Objects/setobject.c:

   Use cases for sets differ considerably from dictionaries where looked-up
   keys are more likely to be present.  In contrast, sets are primarily
   about membership testing where the presence of an element is not known in
   advance.  Accordingly, the set implementation needs to optimize for both
   the found and not-found case.

Looking up a non-existing member is the worst case when using tuples or lists because all elements must be compared to the looked-up key before declaring it a miss, while sets are also optimized for this use-case.

samueltardieu · 2021-07-30T07:59:46Z

Note that a set lookup is amortized O(1) while tuple and list lookup is O(n). However, when n is small of course it makes no sense to compare costs this way, and this is not what I'm doing in the comment above. Python set objects (and particularly frozenset objects) are optimized for this particular use case including for small collections with very few objects.

Also, I've seen many code fragments looking like:

if s in {"abc", "def", "ghi"}:
  do_something(s)
elif s in {"jkl", "mno", "por"}:
  do_something_else(s)

One must realize that s hash is computed only once and cached into the str structure. Look for "por" or for an non-existing member takes 6 comparisons instead of two optimized set lookups.

cdce8p · 2021-07-30T09:41:19Z

As I have explained here before, the main reason for this change wasn't performance, but rather code style. That's also why I added it to the optional CodeStyleExtension. As it says in the documentation

Checkers that can improve code consistency. As such they don't necessarily provide a performance benefit and are often times opinionated.

The main goal is to improve code consistency, even if not always the most absolute most performant.

--
There are a few points that you neglected to consider so far:

Using tuples everywhere (for loops and in comparisons) will make it easier for beginners to understand when and how to use them.
I haven't seen many, if any at all, tutorials using sets for this comparison. It is much more common to suggest lists which is what this change is mainly about.
It is not obvious that Python converts the in-place set to a frozenset from the code alone.
The in-place defined data structure most often only contains one to three elements. Everything more than that should gets it's one variables where frozensets are more efficient.
The performance difference might be 30% in the worst case, but that is a false assumption. Even / especially for small tuples / sets you are just as likely that the item you're looking for is inside the container. With that the difference might only be a few percent. I would argue you won't notice a difference.
Tuples are more memory efficient, by a lot! (Although again you probably won't notice a difference.)
Tuples are still better than lists.

--
As explained above, I don't see a reason to change this check.
The only thing might worth considering would be to add an additional check which would suggest to use a frozenset for
containers larger than x items in in comparisons.

--

from pympler import asizeof
t = ("aa", "bb", "cc", "dd")
s = {"aa", "bb", "cc", "dd"}
s2 = frozenset(t)
print(f"Tuple: {asizeof.asizeof(t)}")
print(f"Set: {asizeof.asizeof(s)}")
print(f"Frozenset: {asizeof.asizeof(s2)}")
--
Tuple: 296
Set: 440
Frozenset: 440

samueltardieu · 2021-07-30T09:45:48Z

The main goal is to improve code consistency, even if not always the most absolute most performant.

That's where our opinions differ I guess. I do not think it is ok to suggest bad practices to a user, especially a beginner, just for the sake of "code consistency". Someone using a set for a membership test is doing nothing wrong and is using the right tool, I would not understand why we would actively suggest using something inferior.

cdce8p · 2021-07-30T09:47:33Z

As I explained, it isn't inferior. You only considered the performance aspect, which isn't the whole picture in this case.

samueltardieu · 2021-07-30T09:48:18Z

Also you write

It is not obvious that Python converts the in-place set to a frozenset from the code alone.

This is easy to check:

>>> import dis
>>> def f():
...     if s in {"abc", "def", "ghi"}:
...             do_something(s)
...     elif s in {"jkl", "mno", "por"}:
...             do_something_else(s)
...
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (s)
              2 LOAD_CONST               1 (frozenset({'def', 'ghi', 'abc'}))
              4 CONTAINS_OP              0
              6 POP_JUMP_IF_FALSE       18

  3           8 LOAD_GLOBAL              1 (do_something)
             10 LOAD_GLOBAL              0 (s)
             12 CALL_FUNCTION            1
             14 POP_TOP
             16 JUMP_FORWARD            16 (to 34)

  4     >>   18 LOAD_GLOBAL              0 (s)
             20 LOAD_CONST               2 (frozenset({'mno', 'jkl', 'por'}))
             22 CONTAINS_OP              0
             24 POP_JUMP_IF_FALSE       34

  5          26 LOAD_GLOBAL              2 (do_something_else)
             28 LOAD_GLOBAL              0 (s)
             30 CALL_FUNCTION            1
             32 POP_TOP
        >>   34 LOAD_CONST               0 (None)
             36 RETURN_VALUE

cdce8p · 2021-07-30T09:51:11Z

I didn't question that behavior. But consider the following, not everyone knows about it.
Just from looking at s in {"a", "b", "c"} someone might not realize that it's converted to a frozenset.

samueltardieu · 2021-07-30T09:55:36Z

As I explained, it isn't inferior. You only considered the performance aspect, which isn't the whole picture in this case.

This is wrong, the performance aspects bother me less than the bad practices ones: one should not actively suggest using sequential scanning for membership tests. This is CS 101. Even the official Python tutorial for set (https://docs.python.org/3/tutorial/datastructures.html) presents the membership test as "fast membership testing", i.e. the right tool for this.

I'm not asking for the reverse suggestion to be added although I think this would be a good idea, I'm just asking not to suggest the user to change their code from set to tuple in the case of membership test.

samueltardieu · 2021-07-30T09:57:42Z

Just from looking at s in {"a", "b", "c"} someone might not realize that it's converted to a frozenset.

The fact that it is a frozenset rather than a set is an implementation detail, as lookups in a set would be as efficient. The important thing is that it is compiled as a set at compilation time, this is what makes the performances good.

If small sets were built at execution time each time a membership tests was done, I would totally agree with you that tuples would be better, but this is not the case.

samueltardieu · 2021-07-30T09:59:53Z

By the way, sorry if I sound passionate about the issue, but with my CS teacher hat on I would hate to see my students change their code from a static set membership test to a tuple one, and it would affect their grade negatively if they had done so without a good justification for using a sequentially scanning tool instead of a hashing one.

cdce8p · 2021-07-30T11:04:51Z

Thank you @samueltardieu for the discussion about this topic.

I took some time to think about it more and looked at some more code. As it turns out Python is better optimized than I was aware of (although I should probably have expected as much). That made me question if the whole consider-using-tuple check is at all worth it.

import dis
def f(s):
    if s in ["a", "b", "c"]:
        return True
# ---
>>> dis.dis(f)
 50           0 LOAD_FAST                0 (s)
              2 LOAD_CONST               1 (('a', 'b', 'c'))
              4 CONTAINS_OP              0
              6 POP_JUMP_IF_FALSE        6 (to 12)

 51           8 LOAD_CONST               2 (True)
             10 RETURN_VALUE

 50     >>   12 LOAD_CONST               0 (None)
             14 RETURN_VALUE

As you can see from the example above, even lists are internally converted to tuples 🤦🏻 Same is true for for loops, btw. As such there is no real value in suggesting either as better tuple or list.

--
Regarding the performance differences, if we just look at tuple / list vs. set / frozenset, the following is true:

As you mentioned (multiple times), sets are almost always faster then tuples for membership tests. (Except for the first element, but that should be replaced by an equality check.)
tuples are faster when iterating over them.

--
What do we make with that?
I'm not sure there is any value at all to keeping the consider-using-tuple check around. The only place for it would be for in-place sets in iterations, but I haven't seen any of them in practice.
With regards to sets in comparisons, they are slightly faster but not (yet?) common. Many will still prefer lists / tuples, just because that's what they are used to (I am, although because of a previous misconception). If it get's the job done, it's often times good enough in Python.

--
I'm considering removing the check completely. What do you two think we should do moving forward @samueltardieu @Pierre-Sassoulas ?

--
Some alternatives to consider (mix and match)

Leave the check for sets in iterations. (not really used)
Add new check for tuple / list in comparisons.
Add new check to suggest the use of list because they look better often times?
Have checks (disabled by default) to suggest tuple or list respectively in iterations. Project could choose which style they like. ~~(caveat starred expressions aren't allowed in tuples: [*args])~~

--
Update
starred expressions are possible in tuples

samueltardieu · 2021-07-30T11:30:54Z

Indeed, Python prefers to build read-only versions (frozenset for a set and tuple for a list) when it can determine at compilation time that the object will not be modified. This gives a slight performance advantage in cpython in the case list vs tuple as the tuple is read-only and the interpreter does not need to temporarily increment the reference count of the element it is considering in the tuple case while it does in the list case (as the element could be dropped by another thread during the comparison, or even by the comparison function itself).

Concerning lists vs tuples, as a fan of immutable data structures, I would say that using tuples instead of lists where possible is best because it encourages people at thinking about immutable structures; however, as a long-time Pythonista myself, I tend to use lists because they look nicer than tuples, especially when they have only one element as they don't require the extra comma, and as you mention [*args] is useful as well.

I would keep the check warning about the use of in-place sets in iterations and suggest using tuples or lists, as I cannot think of any good reason for using a set there. Moreover, switching to lists or tuples will not only be more efficient but also give a consistent traversal order across Python interpreters and versions, while the set guarantees nothing in this domain.

scop · 2021-07-30T13:05:55Z

Just a quick note, if data structures get compiled to other, optimized ones (such as lists to tuples and sets to frozensets like they are shown to be here), but nothing documents such behavior and that it's going to stay, then relying on it would be relying on internal implementation/optimization detail which might or might not hold between Python versions and Python implementations (cpython, pypy, ...). Which isn't good.

There's nothing wrong with recommending a semantically correct way to do things if there is one, and in that sense I think there's still value in this check, if the "correct" way can be agreed on, even if it would not bring any performance benefits.

Pierre-Sassoulas · 2021-07-30T18:40:07Z

The way I understand, it sets are better performance wise for membership and worst for iterating, while list/tuple are equivalent. Let's not bother pylint's user for something that is equivalent and focus on the set vs list/tuple.

How about we remove the current consider-using-tuple as @cdce8p suggested because it seem really opinionated considering what python does under the hood, and replace it by use-sets-for-membership when a list or tuple is used in a membership and use-list-or-tuple-for-iterating when a set is used for iterating (admittedly I've never seen any code doing that) ?

Multi choice vote here:

Remove consider-using-tuple entirely 👀
Add use-sets-for-membership later 👍
Add use-list-or-tuple-for-iterating or do-not-use-sets-for-iterating later 🎉

Planning wise we could remove consider-using-tuple from 2.10, and add the two new checks later ?

scop · 2021-07-31T06:40:37Z

I've seen literal sets used for iterating a bunch of times. IIRC all of them have been inadvertent, and some of them sources of actual bugs due to the unstable order, such as pre-commit/pre-commit#1117. So in my experience something flagging those is most welcome. Not sure which of the choices represent this wish of having some way to check it best, but I'm placing it on 🎉 above.

Re use-list-or-tuple-for-iterating, I'd personally try to nudge towards using tuples there too, as to me it's the most idiomatic way and does not rely on undocumented (I think?) optimization/implementation details. Perhaps by making the identifier use-tuple-or-list-for-iterating, or something like do-not-use-sets-for-iterating and indicate preference towards tuples over lists also in the help message in both cases.

Is the plan BTW to limit the check against sets for iteration for literal sets only, or all of them? There's a difference to me there; non-literal is more likely to be intentional/harmless, maybe that shouldn't cause a warning at all, dunno.

cdce8p · 2021-08-01T11:00:38Z

I've seen literal sets used for iterating a bunch of times.

Certainly an argument in favor of adding / keeping the check for iterations.

use-list-or-tuple-for-iterating

Maybe we should split this into two separate checks. One use-sequence-for-iterating, and another consider-using-tuple if a list is used for in-place iterations. That way projects could choose if they want to enforce tuples but independently still keep a check to make sure no in-place sets are used there.

starred expressions in tuples

I mentioned earlier that starred expressions [*args] aren't possible in tuples. That statement was incorrect. After looking at it again, I found that it does work! You just need to remember the trailing comma (which I forget the first time). That means that in-place lists can be fully replaced by in-place tuples.

--
@scop The current consider-using-tuple check is limited to in-place / literal uses of list and set. I agree with your assumption that non-literal uses are more likely intentional.

Pierre-Sassoulas · 2021-08-01T18:33:49Z

Maybe we should split this into two separate checks. One use-sequence-for-iterating, and another consider-using-tuple if a list is used for in-place iterations.

That make sense, I like that.

Pierre-Sassoulas · 2021-08-15T19:01:36Z

Moving that issue to 2.11 as use-set-for-membership raise problem with unashable element see #4841 (comment) .

Pierre-Sassoulas · 2021-08-17T18:03:47Z

We realized in #4841 that some object not being hashable means that suggesting to use set is impossible for something like this:

     letters= ["a", "b"] 
     numbers = [1, 42] 
 if "c" in (letters, numbers,):
     return

What would be the preferred way for you going forward ?

Do not add use-set-for-membership in pylint at all 👀
Add use-set-for-membership and keep it like it is, the user will choose between making the object hashable or silencing pylint 👍
Add use-set-for-membership but check that all objects in the original container implements __hash__ and __cmp__ or __eq__ (ie they're hashable) 🚀

cdce8p · 2021-08-17T19:32:40Z

I'd prefer not to add use-set-for-membership. As this is a refactoring suggestion, I believe it would only be worth it if it has a clear real world performance benefit and / or no negative side effects. Neither is the case.

We have to consider it might not be obvious for some that sets require hashable items. Someone might start with a list, get the suggestion to use a set and later adds an item that isn't hashable without realizing it. An error that wasn't necessary to begin with.

Furthermore, there is more maintenance required if we add some kind of heuristic. It won't be perfect so we would need to fix corner cases constantly. In the end we might even need a new check to recognize the error I described earlier.

IMO it's just not worth it.

--
Instead of a direct replacement for all, we could consider something like consider-set-for-membership for lists / tuples with a lot of items (e.g., > 5). There the performance difference might be more noticeable.

Pierre-Sassoulas · 2021-08-30T06:36:48Z

This will be fixed by the extension SetMembershipChecker in 2.11.

samueltardieu added Bug 🪲 Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Jul 30, 2021

Pierre-Sassoulas added Discussion 🤔 performance False Positive 🦟 A message is emitted but nothing is wrong with the code and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling Bug 🪲 labels Jul 30, 2021

Pierre-Sassoulas added this to the 2.10.0 milestone Jul 30, 2021

Pierre-Sassoulas removed the Discussion 🤔 label Jul 30, 2021

Pierre-Sassoulas changed the title ~~Unwarranted new suggestion~~ False positive consider-using-tuple for x in set`, because frozen sets are faster Jul 30, 2021

Pierre-Sassoulas changed the title ~~False positive consider-using-tuple for x in set`, because frozen sets are faster~~ False positive consider-using-tuple for x in set, because frozen sets are faster Jul 30, 2021

samueltardieu mentioned this issue Jul 30, 2021

Replace in-place sets with tuples home-assistant/core#53686

Closed

21 tasks

cdce8p mentioned this issue Jul 30, 2021

Replace in-place lists with tuples (1) home-assistant/core#53684

Closed

21 tasks

This was referenced Aug 12, 2021

Revert "Extended consider-using-tuple check to cover 'in' comparisons (#4768)" #4832

Merged

Add use-sequence-for-iteration checker #4835

Merged

Add use-set-for-membership check #4841

Merged

Pierre-Sassoulas modified the milestones: 2.10.0, 2.11.0 Aug 15, 2021

cdce8p mentioned this issue Aug 16, 2021

Emit consider-using-tuple for in comparisons #4853

Closed

Pierre-Sassoulas mentioned this issue Aug 17, 2021

Add possible-forgotten-f-prefix checker #4787

Closed

4 tasks

Pierre-Sassoulas closed this as completed Aug 30, 2021

DanielNoord mentioned this issue Feb 10, 2022

Make pyright recognise narrowing down a type using a in-place set microsoft/pyright#3022

Closed

jacobtylerwalls mentioned this issue Feb 10, 2022

Possible false positive with use-sequence-for-iteration on in-place sets with dynamic content? #5788

Closed

Pierre-Sassoulas mentioned this issue Apr 1, 2022

use-tuple-over-list already exists in pylint in the "code style" extensions as consider-using-tuple tonybaloney/perflint#23

Open

Pierre-Sassoulas mentioned this issue Aug 12, 2023

False negatives for consider-using-tuple #8948

Closed

Yay295 mentioned this issue Nov 6, 2023

ruff: Minor optimizations of list comprehensions, x in set, etc. python-pillow/Pillow#7524

Merged

tdulcet mentioned this issue Nov 20, 2023

Consider conditionally applying rule PLR6201 astral-sh/ruff#8758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positive `consider-using-tuple` for `x in set`, because frozen sets are faster #4776

False positive `consider-using-tuple` for `x in set`, because frozen sets are faster #4776

samueltardieu commented Jul 30, 2021 •

edited

Pierre-Sassoulas commented Jul 30, 2021

samueltardieu commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

scop commented Jul 30, 2021

Pierre-Sassoulas commented Jul 30, 2021 •

edited

scop commented Jul 31, 2021

cdce8p commented Aug 1, 2021

Pierre-Sassoulas commented Aug 1, 2021

Pierre-Sassoulas commented Aug 15, 2021

Pierre-Sassoulas commented Aug 17, 2021

cdce8p commented Aug 17, 2021

Pierre-Sassoulas commented Aug 30, 2021

False positive consider-using-tuple for x in set, because frozen sets are faster #4776

False positive consider-using-tuple for x in set, because frozen sets are faster #4776

Comments

samueltardieu commented Jul 30, 2021 • edited

Bug description

Configuration

Command used

Pylint output

Expected behavior

Pylint version

OS / Environment

Additional dependencies

Pierre-Sassoulas commented Jul 30, 2021

samueltardieu commented Jul 30, 2021 • edited

samueltardieu commented Jul 30, 2021 • edited

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021

samueltardieu commented Jul 30, 2021 • edited

samueltardieu commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Jul 30, 2021 • edited

samueltardieu commented Jul 30, 2021 • edited

scop commented Jul 30, 2021

Pierre-Sassoulas commented Jul 30, 2021 • edited

scop commented Jul 31, 2021

cdce8p commented Aug 1, 2021

Pierre-Sassoulas commented Aug 1, 2021

Pierre-Sassoulas commented Aug 15, 2021

Pierre-Sassoulas commented Aug 17, 2021

cdce8p commented Aug 17, 2021

Pierre-Sassoulas commented Aug 30, 2021

False positive `consider-using-tuple` for `x in set`, because frozen sets are faster #4776

False positive `consider-using-tuple` for `x in set`, because frozen sets are faster #4776

samueltardieu commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

cdce8p commented Jul 30, 2021 •

edited

samueltardieu commented Jul 30, 2021 •

edited

Pierre-Sassoulas commented Jul 30, 2021 •

edited