Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Undesired fallback to brute force container uniqueness check on certain input types #893

Merged
merged 1 commit into from Dec 15, 2021

Conversation

Stranger6667
Copy link
Contributor

@Stranger6667 Stranger6667 commented Dec 15, 2021

Hi!

I noticed, that the uniq implementation uses _sequence_equal if the input is sortable. Then, if e.g. [1, 2, 3, 4, 5] is passed to uniq, then it falls back to brute force because on the first loop iteration _sequence_equal fails with TypeError: object of type 'int' has no len(), but it should not as a list of ints is sortable & straightforward to compare.

Code sample:

CONTAINER = list(range(100))
assert uniq(CONTAINER)
assert not uniq([1, 1])
start = time.perf_counter()
for _ in range(1000):
    uniq(CONTAINER)
print(time.perf_counter() - start)

Before: 5.4033913369985385s
After: 0.12076822199924209s

I didn't check whether this was the direct cause of #853, but hopefully, it can improve the status quo :)

Update: The change doesn't affect #853 much (measured by the code sample from the first comment there):
Before: 27.133657693862915 sec
After: 26.14994215965271 sec

@codecov
Copy link

codecov bot commented Dec 15, 2021

Codecov Report

Merging #893 (f9f8995) into main (c8059bc) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #893   +/-   ##
=======================================
  Coverage   98.33%   98.33%           
=======================================
  Files          19       19           
  Lines        3125     3125           
  Branches      422      422           
=======================================
  Hits         3073     3073           
  Misses         41       41           
  Partials       11       11           
Impacted Files Coverage Δ
jsonschema/_utils.py 98.20% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8059bc...f9f8995. Read the comment docs.

@Julian
Copy link
Member

Julian commented Dec 15, 2021

Fantastic, thanks!

I knew at least there was some improvement to be made after #817 and #820 changed things, but yeah I think the real issue for #853 will have to do with _finditem.

@Julian Julian merged commit ac7b838 into python-jsonschema:main Dec 15, 2021
@Stranger6667 Stranger6667 deleted the dd/uniq-bug branch December 15, 2021 13:01
@Stranger6667
Copy link
Contributor Author

Thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants