Replace in-place sets with tuples #53686

cdce8p · 2021-07-29T11:45:48Z

Proposed change

Continuation of #53684

This time only for sets.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

No score or internal
🥈 Silver
🥇 Gold
🏆 Platinum

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

samueltardieu · 2021-07-29T18:37:34Z

Is this change warranted? While in general it is more efficient to lookup elements in a set rather than in a list or a tuple since those are scanned sequentially, one could object that for small sets the cost of the hash would be greater than the cost of scanning the tuple, and that this will probably be the case in HomeAssistant code.

However, Python builds frozenset objects at compile time while encountering a in { … }expression, and even on small collections it looks like those are more efficient than the tuples (+30% on an unsuccessful lookup with 4 elements):

>>> import timeit
>>> def in_set(x): return x in {"aa", "bb", "cc", "dd"}
...
>>> def in_tuple(x): return x in ("aa", "bb", "cc", "dd")
...
>>> timeit.timeit(lambda: in_set("ee"))
0.09102824400179088
>>> timeit.timeit(lambda: in_tuple("ee"))
0.12370169399946462

Of course the larger the collection is the more efficient sets are compared to tuples, especially when the element is not found since the tuple must be scanned in its entirety.

cdce8p · 2021-07-29T19:51:33Z

@samueltardieu You are right that sets are more efficient in this example. Although the difference isn't as big, since you can't only look at the worst possible time. If the variable is found within the first one / two / three elements there isn't much difference if at all.

This PR only modifies in-place sets. Any collection larger than a few elements should probably get its one constant and be defined as frozenset as you mentioned.

An area you didn't look at is the memory consumption. Here the tuple clearly wins

from pympler import asizeof
t = ("aa", "bb", "cc", "dd")
s = {"aa", "bb", "cc", "dd"}
s2 = frozenset(t)
print(f"Tuple: {asizeof.asizeof(t)}")
print(f"Set: {asizeof.asizeof(s)}")
print(f"Frozenset: {asizeof.asizeof(s2)}")
--
Tuple: 296
Set: 440
Frozenset: 440

--
All in all, I don't believe there to be any real world performance difference from it. Not at the scale we are taking about here.
The intention here was simply to unify the style on how to write in ... statements and for that tuples are the best choice IMO (compared to lists or sets).

uvjustin · 2021-07-29T23:19:09Z

Quick testing with timeit shows that the breakeven for speed comes somewhere between 1 and 2 elements. ie if you expect a large proportion of hits on the first element or you only have like 2 members, tuples will be faster, but generally sets will be faster.
tuples do have a size advantage, but it's on the order of a few hundred bytes for the set/tuple sizes we are talking about.
I don't really mind either way, but I don't think it's right to say that tuples are necessarily better than sets here.

samueltardieu · 2021-07-30T06:56:41Z

Also, contrary to a dict where you expect to get a value back most of the time when looking up the key, it is common to expect no hit when checking for membership in a set. For example if you look at the proposed change for netatmo/sensor.py, an update of attribute gustangle will require 7 comparisons instead of 3 hash lookups (the looked-up string hash is cached in the str structure and is not recomputed) + 1 comparison at most.

I agree that in practice the performance penalty may be limited, but I frown upon not using the right tool for the job (sets) for the sake of "unifying the style" as a lookup is not the same thing as using an enumeration in a for loop. It would be better IMO to make the reverse change and ensure that all such lookups use sets.

samueltardieu · 2021-07-30T08:06:29Z

I have opened pylint-dev/pylint#4776 to discuss the soundness of this future pylint suggestion. I had not realized that @cdce8p was also the author of this lint introduced yesterday in pylint. I would suggest to at least wait until the situation on pylint side is cleared before proceeding.

cdce8p · 2021-08-12T20:17:13Z

Closing this after discussion in pylint-dev/pylint#4776

Replace sets with tuples

7eac3b6

cdce8p requested review from allenporter, bdraco, cgtobi, Danielhiversen, hunterjm, Jc2k, scop and uvjustin as code owners July 29, 2021 11:45

homeassistant added the cla-signed label Jul 29, 2021

probot-home-assistant bot added code-quality core small-pr PRs with less than 30 lines. labels Jul 29, 2021

project-bot bot added this to Needs review in Dev Jul 29, 2021

samueltardieu mentioned this pull request Jul 30, 2021

Replace in-place lists with tuples (1) #53684

Closed

21 tasks

cdce8p mentioned this pull request Jul 30, 2021

False positive consider-using-tuple for x in set, because frozen sets are faster pylint-dev/pylint#4776

Closed

cdce8p marked this pull request as draft July 30, 2021 11:05

cdce8p closed this Aug 12, 2021

Dev automation moved this from Needs review to Cancelled Aug 12, 2021

cdce8p deleted the cs_tuple-3 branch August 12, 2021 20:17

github-actions bot locked and limited conversation to collaborators Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace in-place sets with tuples #53686

Replace in-place sets with tuples #53686

cdce8p commented Jul 29, 2021

samueltardieu commented Jul 29, 2021

cdce8p commented Jul 29, 2021

uvjustin commented Jul 29, 2021

samueltardieu commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Aug 12, 2021

Replace in-place sets with tuples #53686

Replace in-place sets with tuples #53686

Conversation

cdce8p commented Jul 29, 2021

Proposed change

Type of change

Additional information

Checklist

samueltardieu commented Jul 29, 2021

cdce8p commented Jul 29, 2021

uvjustin commented Jul 29, 2021

samueltardieu commented Jul 30, 2021

samueltardieu commented Jul 30, 2021

cdce8p commented Aug 12, 2021