Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use an identity function when no key is given in the unique_everseen recipe #516

Merged
merged 1 commit into from
May 4, 2021

Conversation

Numerlor
Copy link
Contributor

@Numerlor Numerlor commented May 3, 2021

The #488 PR simplified the function, but introduced an additional function call that occurred for every element, even when the key function was not supplied by the user.

Changes

The call now occurs only if the key was supplied by the user.

timeit tuple(unique_everseen_opt(range(10_000)))
1.04 ms ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit tuple(unique_everseen_opt(range(1_000_000)))
138 ms ± 442 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


timeit tuple(unique_everseen(range(10_000)))
1.52 ms ± 39.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit tuple(unique_everseen(range(1_000_000)))
185 ms ± 4.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(_opt is the new function)
the runtime difference here is not huge, but the complexity of the code hasn't increased much. The runtime when a key is supplied will also be a tiny bit bigger

This gets rid of a redundant function call overhead when no key is supplied
@bbayles
Copy link
Collaborator

bbayles commented May 3, 2021

Thanks for the PR. Just for completeness, could you provide a timeit run when using a key?

@Numerlor
Copy link
Contributor Author

Numerlor commented May 3, 2021

Thanks for the PR. Just for completeness, could you provide a timeit run when using a key?

Sure; as I mentioned the new one is a tiny bit slower because it has to check the condition for every iteration, but it's negligible compared to the speedup gained for when the key is not there

timeit tuple(unique_everseen_opt(range(10_000), key=lambda x: x))
1.53 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit tuple(unique_everseen_opt(range(1_000_000), key=lambda x: x))
186 ms ± 5.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(The original function takes the same time regardless of the key being supplied as it's used unconditionally so nothing changed there.)

@bbayles bbayles merged commit e899af2 into more-itertools:master May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants