Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add B033: Duplicate items in sets #373

Merged
merged 3 commits into from
May 9, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ second usage. Save the result to a list if the result is needed multiple times.

**B032**: Possible unintentional type annotation (using ``:``). Did you mean to assign (using ``=``)?

**B033**: Sets should not contain duplicate items. Duplicate items will be replaced with a single item at runtime.

Opinionated warnings
~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -329,6 +331,7 @@ Unreleased
~~~~~~~~~~

* B030: Fix crash on certain unusual except handlers (e.g. ``except a[0].b:``)
* Add B033: Check for duplicate items in sets.

23.3.12
~~~~~~~~
Expand Down
19 changes: 19 additions & 0 deletions bugbear.py
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,10 @@ def visit_Import(self, node):
self.check_for_b005(node)
self.generic_visit(node)

def visit_Set(self, node):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this for dict keys too? It should be fairly easy and it's also a reasonably common bug.

Copy link
Contributor Author

@FozzieHi FozzieHi Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyflakes (which is included in flake8) has a rule to detect duplicate keys with different values, F601 dictionary key name repeated with different values.

It doesn't flag duplicate keys with the same values, but as that's unlikely to cause a bug per se, should we check for that?

Edit: Although I guess you could say the same about duplicate entries in sets. If we wanted to add it it's fairly simple and I have a demo working.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra context! Yes I think dicts with both keys and values the same are pretty much equivalent to duplicate set members. Honestly I don't have a great sense of what rules belong where, but maybe to avoid duplication we should only alert in cases where pyflakes doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we would only want to flag when both the dictionary keys and values are the same to avoid conflicts with Pyflakes.

I think a new rule would be good for this (in case people want to disable each rule separately) and we could leave B033 as it is? As we'd be checking both the keys and the values it also wouldn't easily fit into this rule anyway, so I don't think we'd gain much by combining them into a single rule.

self.check_for_b033(node)
self.generic_visit(node)

def check_for_b005(self, node):
if isinstance(node, ast.Import):
for name in node.names:
Expand Down Expand Up @@ -1308,6 +1312,14 @@ def check_for_b032(self, node):
):
self.errors.append(B032(node.lineno, node.col_offset))

def check_for_b033(self, node):
constants = [
item.value
for item in filter(lambda x: isinstance(x, ast.Constant), node.elts)
]
if len(constants) != len(set(constants)):
self.errors.append(B033(node.lineno, node.col_offset))


def compose_call_path(node):
if isinstance(node, ast.Attribute):
Expand Down Expand Up @@ -1705,6 +1717,13 @@ def visit_Lambda(self, node):
)
)

B033 = Error(
message=(
"B033 Sets should not contain duplicate items. Duplicate items will be replaced"
" with a single item at runtime."
)
)

# Warnings disabled by default.
B901 = Error(
message=(
Expand Down
17 changes: 17 additions & 0 deletions tests/b033.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""
Should emit:
B033 - on lines 6-12
"""

test = {1, 2, 3, 3, 5}
test = {"a", "b", "c", "c", "e"}
test = {True, False, True}
test = {None, True, None}
test = {3, 3.0}
test = {1, True}
test = {0, False}
Comment on lines +11 to +12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do True and False booleans hash to the same as 0 and !0 ... TIL if so. Guess I've never thought about it.

Copy link
Contributor Author

@FozzieHi FozzieHi Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I also discovered this when creating the check.

For example, this:

test = {1, True}
print(test)

Will print:

{1}


test = {1, 2, 3, 3.5, 5}
test = {"a", "b", "c", "d", "e"}
test = {True, False}
test = {None}
16 changes: 16 additions & 0 deletions tests/test_bugbear.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
B030,
B031,
B032,
B033,
B901,
B902,
B903,
Expand Down Expand Up @@ -489,6 +490,21 @@ def test_b032(self):
)
self.assertEqual(errors, expected)

def test_b033(self):
filename = Path(__file__).absolute().parent / "b033.py"
bbc = BugBearChecker(filename=str(filename))
errors = list(bbc.run())
expected = self.errors(
B033(6, 7),
B033(7, 7),
B033(8, 7),
B033(9, 7),
B033(10, 7),
B033(11, 7),
B033(12, 7),
)
self.assertEqual(errors, expected)

@unittest.skipIf(sys.version_info < (3, 8), "not implemented for <3.8")
def test_b907(self):
filename = Path(__file__).absolute().parent / "b907.py"
Expand Down