Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use-set-for-membership check #4841

Merged
merged 4 commits into from Aug 30, 2021

Conversation

cdce8p
Copy link
Member

@cdce8p cdce8p commented Aug 13, 2021

Type of Changes

Type
✨ New feature

Description

Add new use-set-for-membership check following the discussion in #4776.

# old
x in (1, 2, 3)  # [use-set-for-membership]
x in [1, 2, 3]  # [use-set-for-membership]

# new
x in {1, 2, 3}


# only in-place defined lists and tuples will be marked
# I.e. the following would NOT result in an error
var = [1, 2, 3]
x in var

For the moment I've chosen to add it to the CodeStyleChecker, thus not enabling it by default.
However this is certainly something that might be worth having a discussion about.

Closes #4776

--
/CC: @Pierre-Sassoulas, @samueltardieu

@cdce8p cdce8p added the Enhancement ✨ Improvement to a component label Aug 13, 2021
@cdce8p cdce8p added this to the 2.10.0 milestone Aug 13, 2021
@coveralls
Copy link

coveralls commented Aug 13, 2021

Pull Request Test Coverage Report for Build 1181150804

  • 24 of 24 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 92.783%

Totals Coverage Status
Change from base Build 1181148707: 0.01%
Covered Lines: 13563
Relevant Lines: 14618

💛 - Coveralls

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this a default checker I think. I'm not merging immediately so it can be discussed. It's going in before we release 2.10.0 either way.

pylint/extensions/code_style.py Outdated Show resolved Hide resolved
@Pierre-Sassoulas
Copy link
Member

The numerous warning in pylint's code are an opportunity to check the performance improvement that this checker really bring :D

@cdce8p cdce8p force-pushed the cs_set-membership branch 2 times, most recently from d674e60 to 6e0dd4a Compare August 14, 2021 12:03
@cdce8p
Copy link
Member Author

cdce8p commented Aug 14, 2021

While fixing the existing code, I came across something we haven't yet considered.

https://github.com/PyCQA/pylint/blob/f2cd986da07c12e81dd411e4b1190386a6cad750/pylint/checkers/refactoring/refactoring_checker.py#L1417-L1420

Here, pylint would currently print a warning for L1419. However, replacing the tuple with a set results in a TypeError:

TypeError: unhashable type: 'list'

--
It's more or less obvious that sets require all items to be hashable but that does also mean we can't just suggest use-set-for-membership for everything. If we don't account for that, we risk introducing difficult to find errors.

Suggestions going forward

We can only be certain if all items in a list / tuple are in-place defined constants (str, int, bool, ...). What about if a new item would be added later that isn't hashable?

As the performance benefit from this change would be minimal, I would suggest not to move forward with it.
Instead, we could consider extending consider-using-tuple once again. In contrast to #4768 however, this time only for lists.

Furthermore, we could add a new refactoring check consider-using-set to suggest the use of set / frozenset if the in-place defined list is quite long. Not sure though if this would truly be worth it.

@Pierre-Sassoulas
Copy link
Member

Maybe we can check that the __hash__ and __cmp__ or __eq__ function is defined for each object and warn only if that's the case ? Adding object that are not of the same type in a set or in any container for that matter do not make a lot of sense. So I think there would be very few error caused by adding another type of object after pylint suggested to change to set (and in that case the user will go back to a container that does not need a hash and there would not be any false positive because this time not all object would be hashable).

@cdce8p
Copy link
Member Author

cdce8p commented Aug 15, 2021

Adding object that are not of the same type in a set or in any container for that matter do not make a lot of sense.

True, but it is possible. Python is a dynamic language after all.

So I think there would be very few error caused by adding another type of object after pylint suggested to change to set (and in that case the user will go back to a container that does not need a hash and there would not be any false positive because this time not all object would be hashable).

I don't think even a very few false positives are worth it considering that the change provides only a minimal performance benefit in practice (if any at all).

@samueltardieu
Copy link

samueltardieu commented Aug 15, 2021 via email

@cdce8p
Copy link
Member Author

cdce8p commented Aug 15, 2021

Implementing it for collections containing string literals (probably a very common case) would bring a notable benefit as string hashes are cached in the string structure, especially in successive if in/elif in/... constructs.

As pylint doesn't support control flow inference, we can't reliably use infer for it. Thus this would be limited to in-place strings, ints, and so on.

Still believe the effort wouldn't be worth it here, unfortunately.

--
Just to clarify, I'm not suggesting we should recommend tuples instead of sets. (That change was reverted in #4832.) Only in cases that currently use lists (for consistency).

@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 2.10.0, 2.11.0 Aug 15, 2021
@Pierre-Sassoulas
Copy link
Member

I moved that to 2.11 so we do not delay 2.10 while we discuss this.

@cdce8p
Copy link
Member Author

cdce8p commented Aug 16, 2021

I opened #4853 which extends consider-using-tuple for lists in in comparisons like I mentioned earlier. This would be my preferred way going forward.

@DanielNoord
Copy link
Collaborator

What about if a new item would be added later that isn't hashable?

Perhaps I am mistaken but isn't this impossible?
With x in set() if set() is not a variable but defined in place it is impossible to add something later, right? The check should therefore consider:

  1. Is the comparable on the right defined-in-place?
  2. Is the comparable comprised of only hashable objects?
  3. Is the comparable a set?

Although the order might need to be switched, as I think performance-wise it will be better to perform check 3 as the first step.

This would mean that checking for pre-defined variables such as on line 4 of the tests (var = frozenset({1, 2, 3})) will become impossible, but I think adding the check is still useful; When I started learning Python checks such as these triggered me to search for the rationale behind these checkers and would eventually lead me to write more optimised code. Even if there is no real performance benefit for small sets (for example, x in (1, 2) vs. x in {1, 2}) enforcing such rules might lead to positive side-effects in other places.

@cdce8p
Copy link
Member Author

cdce8p commented Aug 20, 2021

What about if a new item would be added later that isn't hashable?

Perhaps I am mistaken but isn't this impossible?
With x in set() if set() is not a variable but defined in place it is impossible to add something later, right?

Of course at runtime no new items will be added. That doesn't however consider if you later change the code itself.

@DanielNoord
Copy link
Collaborator

Hmm, I would argue that that is no reason not to add this check.
We could even create a unhashable-in-set checker to accompany it (if nothing of the sort already exists) to catch these cases before runtime.
Pylint wouldn't suggest any incorrect code and if you later change it to a tuple/list again with an unhashable it will no longer complain. Warnings becoming irrelevant/false after code has been changed should in my opinion not be a reason not to add them. However, this is also something the maintainers should weigh in on.

@cdce8p
Copy link
Member Author

cdce8p commented Aug 20, 2021

We could even create a unhashable-in-set checker to accompany it (if nothing of the sort already exists) to catch these cases before runtime.

Of course that would be an option. I believe I even suggested it somewhere already.
Keep in mind though, and this is what my argument is all about, that you then end up with two separate checks that both use heuristics which will contain errors. I simply believe the performance improvement isn't worth the added maintenance cost.

@cdce8p cdce8p force-pushed the cs_set-membership branch 2 times, most recently from fce9190 to a504e1a Compare August 20, 2021 23:06
@cdce8p
Copy link
Member Author

cdce8p commented Aug 21, 2021

I've moved the code to a new extension: SetMembershipChecker.
For those who really what to use it, they could. I still wouldn't recommend it though.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌 Just waiting for the hot fix for 2.10 to dry before merging.

@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review August 21, 2021 18:19
@cdce8p
Copy link
Member Author

cdce8p commented Aug 21, 2021

Nice 👌 Just waiting for the hot fix for 2.10 to dry before merging.

@Pierre-Sassoulas Should we create a new label for it? Something like blocked? To indicate PRs that have been accepted but we are holding off merging.

@Pierre-Sassoulas Pierre-Sassoulas added the Blocked 🚧 Blocked by a particular issue label Aug 21, 2021
@Pierre-Sassoulas Pierre-Sassoulas removed the Blocked 🚧 Blocked by a particular issue label Aug 30, 2021
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should handle the following ?

class Hashable:
    pass


class NotHashable:
    def __eq__(self, other):
        return self == other


class HashableAgain:
    def __eq__(self, other):
        return self == other

    def __hash__(self):
        return id(self)

x in (1, Hashable, HashableAgain)  # [use-set-for-membership]
x in (1, NotHashable)  # List is not hashable

@cdce8p
Copy link
Member Author

cdce8p commented Aug 30, 2021

Do you think we should handle the following ?

If someone likes to contribute it, that would certainly be an option. I think this should go in another PR though. This one only adds a basic check for constants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False positive consider-using-tuple for x in set, because frozen sets are faster
5 participants