Specialize len in ExactSizeIterator implementations #91998

ssomers · 2021-12-16T10:38:04Z

~~Two commits (tell me if you prefer separate PRs):~~

Override the default implementation of Iterator::count in ExactSizeIterator iterators without side effects, much like When possible without changing semantics, implement Iterator::last in terms of DoubleEndedIterator::next_back for types in liballoc and libcore. #62316, making them O(1) instead of O(n). Though I doubt count is used often or at all.
- Most of these count simply delegate to an inner implementation, which means the ones in HashMap/HashSet don't actually shortcut until hashbrown's would do so.
- library/alloc/src/boxed.rs: is only conditionally ExactSizeIterator.
Override the default implementation of ExactSizeIterator::len, where size_hint clearly returns equal lower and upper limits. This contradicts ExactSizeIterator's advice "The len method has a default implementation, so you usually shouldn’t implement it", but I think it makes code easier to understand, might improve performance, and most of the library iterators already do it: I counted 50 iterators that do, plus an unexplored number of slice iterators through a macro, against 31 iterators that didn't and are in this PR, and these that can't easily be:
- library/core/src/iter/adapters/peekable.rs: is only conditionally ExactSizeIterator
- library/core/src/iter/adapters/skip.rs: ditto
- library/core/src/iter/adapters/step_by.rs: ditto
- library/core/src/iter/adapters/take.rs: ditto
- library/core/src/iter/adapters/zip.rs: ditto
- library/core/src/iter/range.rs: not touching that without a radiation suit
- library/core/src/slice/ascii.rs: FlatMap is only conditionally TrustedLen and not even ExactSizeIterator

rust-highfive · 2021-12-16T10:38:08Z

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

c410-f3r · 2021-12-16T13:13:33Z

Hum... Many of these one line functions are good #[inline] candidates

ssomers · 2021-12-16T22:50:23Z

Hum... Many of these one line functions are good #[inline] candidates

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly and let the compiler decide. It seems ridiculous to me to slap #[inline] on a (generic) function that simply returns self.length - surely the compiler knows there's no point in creating a function for a single opcode. But I also noticed that splitting off part of a function that is #[inline] to a new function that is not #[inline], often leads to a performance backlash. Which makes some sense because the new function used to be literally inlined. So I only put #[inline] where size_hint now calls the new function.

bors · 2022-03-30T12:54:18Z

☔ The latest upstream changes (presumably #95241) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2022-05-12T05:29:12Z

☔ The latest upstream changes (presumably #95837) made this pull request unmergeable. Please resolve the merge conflicts.

JohnCSimon · 2022-06-20T04:51:24Z

This is still waiting on review.

m-ou-se · 2022-06-20T08:04:08Z

r? @the8472 - Do you have time to review this? I think you're our Iterator expert. :)

the8472

Have you done any measurements or looked at assembly and seen improvements? Or what motivated this change? Not that I am opposed to it, it looks quite reasonable.

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly and let the compiler decide. It seems ridiculous to me to slap #[inline] on a (generic) function that simply returns self.length - surely the compiler knows there's no point in creating a function for a single opcode. But I also noticed that splitting off part of a function that is #[inline] to a new function that is not #[inline], often leads to a performance backlash.

The impact varies a lot. It depends on whether the function is part of a loop body our outside, whether inlining it unlocks other optimizations and also on the number of codegen units. Afaik the CGU splitting algorithm takes inlining annotations into account and tries to group related methods together so they can be optimized together.

count() consumes the iterator, so it's less likely that the count will be used in a bounds check of a following loop. len() on the other hand may feature as condition on some loops, so inlining it could be help, especially when compiling with multiple CGUs and LTO=off. That's presumably why the default impl of len() and many size_hint are inline.

So I only put #[inline] where size_hint now calls the new function.

You might want to consider the case where size_hint already is #[inline]. Since the default impl also is inline that means it previously was inlined all the way.

Since some of those iterators are very hot (although I don't know how much code relies on len()) I think it makes sense to put this through perf.

@bors try @rust-timer queue

the8472 · 2022-06-20T20:42:16Z

looks like the bots don't listen to review comments.

@bors try @rust-timer queue

bors · 2022-06-20T22:04:59Z

⌛ Trying commit 5afc44d28a758bb1559ea8ec0d34434e00030357 with merge 6354a6adb63febaea3c269075326e12e45008c86...

rust-timer · 2022-06-20T22:06:56Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-20T23:55:23Z

☀️ Try build successful - checks-actions
Build commit: 6354a6adb63febaea3c269075326e12e45008c86 (6354a6adb63febaea3c269075326e12e45008c86)

rust-timer · 2022-06-20T23:55:24Z

Queued 6354a6adb63febaea3c269075326e12e45008c86 with parent b12708f, future comparison URL.

rust-timer · 2022-06-21T02:26:22Z

Finished benchmarking commit (6354a6adb63febaea3c269075326e12e45008c86): comparison url.

Instruction count

Primary benchmarks: mixed results
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	1.7%	2.4%	2
Regressions 😿 (secondary)	0.5%	0.9%	6
Improvements 🎉 (primary)	-1.1%	-1.1%	1
Improvements 🎉 (secondary)	-0.3%	-0.3%	1
All 😿🎉 (primary)	0.8%	2.4%	3

Max RSS (memory usage)

Results

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	4.9%	10.2%	5
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	4.9%	10.2%	5

Cycles

Results

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	3.7%	4.2%	3
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	3.7%	4.2%	3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

ssomers · 2022-06-21T11:44:18Z

Have you done any measurements or looked at assembly and seen improvements?

No. I just saw the inconsistency where last got a general treatment and count didn't though some count did, and len varies too but I find the explicit form easier to understand.

In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly

The impact varies a lot.

I've recently seen it matter for “correctness” too: #[inline] avoids stack overflow by avoiding that moved values get copied on the stack.

count() consumes the iterator, so it's less likely that the count will be used in a bounds check of a following loop.

There is one way to speculate less about it… make them abort and see how far we get with the test suite. I haven't completed this ~~but I haven't seen~~ the first time I see count being used where len would be smarter is in chalk-engine, but multiple unit tests try to provoke weirdness from an iterator this way, sometimes next to a unit test explicitly testing the iterator's len. So the first commit here is not good for testing.

So I only put #[inline] where size_hint now calls the new function.

You might want to consider the case where size_hint already is #[inline]. Since the default impl also is inline that means it previously was inlined all the way.

Well, what I pursued is a more general case - I don't even care whether size_hint was inline? Except that looking back now, I missed at least one spot - VecDeque::IterMut.

ssomers · 2022-06-22T08:23:20Z

So I yanked count and added more #[inline]'s - beyond the rule "only put #[inline] where size_hint now calls the new function" on to "if functions next to me got #[inline], I'm getting some for myself". Whereas the always basic ExactSizeIterator::is_empty implementations usually didn't get #[inline] - except the default implementation, Bytes and the ones for slices.

the8472 · 2022-06-22T21:48:52Z

Then I'm not quite seeing the motivation here. consistency-of-impls isn't that important since the API surface doesn't change and the impls are non-uniform for other reasons anyway. As long as the behavior is correct and fast the increase in lines of code doesn't seem to be worth it.

rust-highfive assigned m-ou-se Dec 16, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 16, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2022

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2022

ssomers force-pushed the iterator_count_len branch from 7b2d47d to ce4ec10 Compare April 1, 2022 09:22

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2022

ssomers force-pushed the iterator_count_len branch from ce4ec10 to 5afc44d Compare May 13, 2022 08:19

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 13, 2022

rust-highfive assigned the8472 and unassigned m-ou-se Jun 20, 2022

the8472 reviewed Jun 20, 2022

View reviewed changes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 20, 2022

rustbot added the perf-regression Performance regression. label Jun 21, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 21, 2022

Specialize len for ExactSizeIterator with unmistakable size_hint

84125b2

ssomers force-pushed the iterator_count_len branch from 5afc44d to 84125b2 Compare June 21, 2022 23:03

ssomers changed the title ~~Specialize count and len in ExactSizeIterator implementations~~ Specialize len in ExactSizeIterator implementations Jun 22, 2022

ssomers closed this Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize len in ExactSizeIterator implementations #91998

Specialize len in ExactSizeIterator implementations #91998

ssomers commented Dec 16, 2021 •

edited

rust-highfive commented Dec 16, 2021

c410-f3r commented Dec 16, 2021

ssomers commented Dec 16, 2021 •

edited

bors commented Mar 30, 2022

bors commented May 12, 2022

JohnCSimon commented Jun 20, 2022

m-ou-se commented Jun 20, 2022

the8472 left a comment

the8472 commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

rust-timer commented Jun 21, 2022

ssomers commented Jun 21, 2022 •

edited

ssomers commented Jun 22, 2022

the8472 commented Jun 22, 2022

Specialize len in ExactSizeIterator implementations #91998

Specialize len in ExactSizeIterator implementations #91998

Conversation

ssomers commented Dec 16, 2021 • edited

rust-highfive commented Dec 16, 2021

c410-f3r commented Dec 16, 2021

ssomers commented Dec 16, 2021 • edited

bors commented Mar 30, 2022

bors commented May 12, 2022

JohnCSimon commented Jun 20, 2022

m-ou-se commented Jun 20, 2022

the8472 left a comment

Choose a reason for hiding this comment

the8472 commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

rust-timer commented Jun 21, 2022

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

ssomers commented Jun 21, 2022 • edited

ssomers commented Jun 22, 2022

the8472 commented Jun 22, 2022

ssomers commented Dec 16, 2021 •

edited

ssomers commented Dec 16, 2021 •

edited

ssomers commented Jun 21, 2022 •

edited