Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize len in ExactSizeIterator implementations #91998

Closed
wants to merge 1 commit into from

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Dec 16, 2021

Two commits (tell me if you prefer separate PRs):

  1. Override the default implementation of Iterator::count in ExactSizeIterator iterators without side effects, much like When possible without changing semantics, implement Iterator::last in terms of DoubleEndedIterator::next_back for types in liballoc and libcore. #62316, making them O(1) instead of O(n). Though I doubt count is used often or at all.

    • Most of these count simply delegate to an inner implementation, which means the ones in HashMap/HashSet don't actually shortcut until hashbrown's would do so.
    • library/alloc/src/boxed.rs: is only conditionally ExactSizeIterator.
  2. Override the default implementation of ExactSizeIterator::len, where size_hint clearly returns equal lower and upper limits. This contradicts ExactSizeIterator's advice "The len method has a default implementation, so you usually shouldn’t implement it", but I think it makes code easier to understand, might improve performance, and most of the library iterators already do it: I counted 50 iterators that do, plus an unexplored number of slice iterators through a macro, against 31 iterators that didn't and are in this PR, and these that can't easily be:

    • library/core/src/iter/adapters/peekable.rs: is only conditionally ExactSizeIterator
    • library/core/src/iter/adapters/skip.rs: ditto
    • library/core/src/iter/adapters/step_by.rs: ditto
    • library/core/src/iter/adapters/take.rs: ditto
    • library/core/src/iter/adapters/zip.rs: ditto
    • library/core/src/iter/range.rs: not touching that without a radiation suit
    • library/core/src/slice/ascii.rs: FlatMap is only conditionally TrustedLen and not even ExactSizeIterator

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 16, 2021
@c410-f3r
Copy link
Contributor

Hum... Many of these one line functions are good #[inline] candidates

@ssomers
Copy link
Contributor Author

ssomers commented Dec 16, 2021

Hum... Many of these one line functions are good #[inline] candidates

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly and let the compiler decide. It seems ridiculous to me to slap #[inline] on a (generic) function that simply returns self.length - surely the compiler knows there's no point in creating a function for a single opcode. But I also noticed that splitting off part of a function that is #[inline] to a new function that is not #[inline], often leads to a performance backlash. Which makes some sense because the new function used to be literally inlined. So I only put #[inline] where size_hint now calls the new function.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2022
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2022
@bors
Copy link
Contributor

bors commented Mar 30, 2022

☔ The latest upstream changes (presumably #95241) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2022
@bors
Copy link
Contributor

bors commented May 12, 2022

☔ The latest upstream changes (presumably #95837) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 13, 2022
@JohnCSimon
Copy link
Member

This is still waiting on review.

@m-ou-se
Copy link
Member

m-ou-se commented Jun 20, 2022

r? @the8472 - Do you have time to review this? I think you're our Iterator expert. :)

@rust-highfive rust-highfive assigned the8472 and unassigned m-ou-se Jun 20, 2022
Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done any measurements or looked at assembly and seen improvements? Or what motivated this change? Not that I am opposed to it, it looks quite reasonable.

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly and let the compiler decide. It seems ridiculous to me to slap #[inline] on a (generic) function that simply returns self.length - surely the compiler knows there's no point in creating a function for a single opcode. But I also noticed that splitting off part of a function that is #[inline] to a new function that is not #[inline], often leads to a performance backlash.

The impact varies a lot. It depends on whether the function is part of a loop body our outside, whether inlining it unlocks other optimizations and also on the number of codegen units. Afaik the CGU splitting algorithm takes inlining annotations into account and tries to group related methods together so they can be optimized together.

count() consumes the iterator, so it's less likely that the count will be used in a bounds check of a following loop. len() on the other hand may feature as condition on some loops, so inlining it could be help, especially when compiling with multiple CGUs and LTO=off. That's presumably why the default impl of len() and many size_hint are inline.

So I only put #[inline] where size_hint now calls the new function.

You might want to consider the case where size_hint already is #[inline]. Since the default impl also is inline that means it previously was inlined all the way.

Since some of those iterators are very hot (although I don't know how much code relies on len()) I think it makes sense to put this through perf.

@bors try @rust-timer queue

@the8472
Copy link
Member

the8472 commented Jun 20, 2022

looks like the bots don't listen to review comments.

@bors try @rust-timer queue

@bors
Copy link
Contributor

bors commented Jun 20, 2022

⌛ Trying commit 5afc44d28a758bb1559ea8ec0d34434e00030357 with merge 6354a6adb63febaea3c269075326e12e45008c86...

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 20, 2022
@bors
Copy link
Contributor

bors commented Jun 20, 2022

☀️ Try build successful - checks-actions
Build commit: 6354a6adb63febaea3c269075326e12e45008c86 (6354a6adb63febaea3c269075326e12e45008c86)

@rust-timer
Copy link
Collaborator

Queued 6354a6adb63febaea3c269075326e12e45008c86 with parent b12708f, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6354a6adb63febaea3c269075326e12e45008c86): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
1.7% 2.4% 2
Regressions 😿
(secondary)
0.5% 0.9% 6
Improvements 🎉
(primary)
-1.1% -1.1% 1
Improvements 🎉
(secondary)
-0.3% -0.3% 1
All 😿🎉 (primary) 0.8% 2.4% 3

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
4.9% 10.2% 5
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 4.9% 10.2% 5

Cycles

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
3.7% 4.2% 3
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 3.7% 4.2% 3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added the perf-regression Performance regression. label Jun 21, 2022
@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 21, 2022
@ssomers
Copy link
Contributor Author

ssomers commented Jun 21, 2022

Have you done any measurements or looked at assembly and seen improvements?

No. I just saw the inconsistency where last got a general treatment and count didn't though some count did, and len varies too but I find the explicit form easier to understand.

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly

The impact varies a lot.

I've recently seen it matter for “correctness” too: #[inline] avoids stack overflow by avoiding that moved values get copied on the stack.

count() consumes the iterator, so it's less likely that the count will be used in a bounds check of a following loop.

There is one way to speculate less about it… make them abort and see how far we get with the test suite. I haven't completed this but I haven't seen the first time I see count being used where len would be smarter is in chalk-engine, but multiple unit tests try to provoke weirdness from an iterator this way, sometimes next to a unit test explicitly testing the iterator's len. So the first commit here is not good for testing.

So I only put #[inline] where size_hint now calls the new function.

You might want to consider the case where size_hint already is #[inline]. Since the default impl also is inline that means it previously was inlined all the way.

Well, what I pursued is a more general case - I don't even care whether size_hint was inline? Except that looking back now, I missed at least one spot - VecDeque::IterMut.

@ssomers ssomers changed the title Specialize count and len in ExactSizeIterator implementations Specialize len in ExactSizeIterator implementations Jun 22, 2022
@ssomers
Copy link
Contributor Author

ssomers commented Jun 22, 2022

So I yanked count and added more #[inline]'s - beyond the rule "only put #[inline] where size_hint now calls the new function" on to "if functions next to me got #[inline], I'm getting some for myself". Whereas the always basic ExactSizeIterator::is_empty implementations usually didn't get #[inline] - except the default implementation, Bytes and the ones for slices.

@the8472
Copy link
Member

the8472 commented Jun 22, 2022

Then I'm not quite seeing the motivation here. consistency-of-impls isn't that important since the API surface doesn't change and the impls are non-uniform for other reasons anyway. As long as the behavior is correct and fast the increase in lines of code doesn't seem to be worth it.

@ssomers ssomers closed this Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants