Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace in-place sets with tuples #53686

Closed
wants to merge 1 commit into from
Closed

Conversation

cdce8p
Copy link
Member

@cdce8p cdce8p commented Jul 29, 2021

Proposed change

Continuation of #53684

This time only for sets.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 馃 Silver
  • 馃 Gold
  • 馃弳 Platinum

To help with the load of incoming pull requests:

@samueltardieu
Copy link
Contributor

Is this change warranted? While in general it is more efficient to lookup elements in a set rather than in a list or a tuple since those are scanned sequentially, one could object that for small sets the cost of the hash would be greater than the cost of scanning the tuple, and that this will probably be the case in HomeAssistant code.

However, Python builds frozenset objects at compile time while encountering a in { 鈥β爙expression, and even on small collections it looks like those are more efficient than the tuples (+30% on an unsuccessful lookup with 4 elements):

>>> import timeit
>>> def in_set(x): return x in {"aa", "bb", "cc", "dd"}
...
>>> def in_tuple(x): return x in ("aa", "bb", "cc", "dd")
...
>>> timeit.timeit(lambda: in_set("ee"))
0.09102824400179088
>>> timeit.timeit(lambda: in_tuple("ee"))
0.12370169399946462

Of course the larger the collection is the more efficient sets are compared to tuples, especially when the element is not found since the tuple must be scanned in its entirety.

@cdce8p
Copy link
Member Author

cdce8p commented Jul 29, 2021

@samueltardieu You are right that sets are more efficient in this example. Although the difference isn't as big, since you can't only look at the worst possible time. If the variable is found within the first one / two / three elements there isn't much difference if at all.

This PR only modifies in-place sets. Any collection larger than a few elements should probably get its one constant and be defined as frozenset as you mentioned.

An area you didn't look at is the memory consumption. Here the tuple clearly wins

from pympler import asizeof
t = ("aa", "bb", "cc", "dd")
s = {"aa", "bb", "cc", "dd"}
s2 = frozenset(t)
print(f"Tuple: {asizeof.asizeof(t)}")
print(f"Set: {asizeof.asizeof(s)}")
print(f"Frozenset: {asizeof.asizeof(s2)}")
--
Tuple: 296
Set: 440
Frozenset: 440

--
All in all, I don't believe there to be any real world performance difference from it. Not at the scale we are taking about here.
The intention here was simply to unify the style on how to write in ... statements and for that tuples are the best choice IMO (compared to lists or sets).

@uvjustin
Copy link
Contributor

Quick testing with timeit shows that the breakeven for speed comes somewhere between 1 and 2 elements. ie if you expect a large proportion of hits on the first element or you only have like 2 members, tuples will be faster, but generally sets will be faster.
tuples do have a size advantage, but it's on the order of a few hundred bytes for the set/tuple sizes we are talking about.
I don't really mind either way, but I don't think it's right to say that tuples are necessarily better than sets here.

@samueltardieu
Copy link
Contributor

Also, contrary to a dict where you expect to get a value back most of the time when looking up the key, it is common to expect no hit when checking for membership in a set. For example if you look at the proposed change for netatmo/sensor.py, an update of attribute gustangle will require 7 comparisons instead of 3 hash lookups (the looked-up string hash is cached in the str structure and is not recomputed) + 1 comparison at most.

I agree that in practice the performance penalty may be limited, but I frown upon not using the right tool for the job (sets) for the sake of "unifying the style" as a lookup is not the same thing as using an enumeration in a for loop. It would be better IMO to make the reverse change and ensure that all such lookups use sets.

@samueltardieu
Copy link
Contributor

I have opened pylint-dev/pylint#4776 to discuss the soundness of this future pylint suggestion. I had not realized that @cdce8p was also the author of this lint introduced yesterday in pylint. I would suggest to at least wait until the situation on pylint side is cleared before proceeding.

@cdce8p
Copy link
Member Author

cdce8p commented Aug 12, 2021

Closing this after discussion in pylint-dev/pylint#4776

@cdce8p cdce8p closed this Aug 12, 2021
Dev automation moved this from Needs review to Cancelled Aug 12, 2021
@cdce8p cdce8p deleted the cs_tuple-3 branch August 12, 2021 20:17
@github-actions github-actions bot locked and limited conversation to collaborators Aug 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Dev
  
Cancelled
Development

Successfully merging this pull request may close these issues.

None yet

4 participants