Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Combine for all nil scenarios #55

Merged
merged 9 commits into from Feb 28, 2022

Conversation

jstroem
Copy link
Contributor

@jstroem jstroem commented Feb 28, 2022

During the work on yarpc/yarpc-go#2126 I found that multierr.Combine always allocates the error slice on the heap because of the escape analysis.

Assuming that most cases where multierr.Combine is called all arguments are nil it make sense to optimize it with that in mind:

Benchmark results for this optimization:

name                              old time/op    new time/op    delta
Combine/inline_1                    29.3ns ±15%     2.0ns ± 7%   -93.28%  (p=0.000 n=10+9)
Combine/inline_2                    40.0ns ± 6%     3.4ns ±11%   -91.55%  (p=0.000 n=10+10)
Combine/inline_3_no_error           41.9ns ± 2%     4.8ns ±15%   -88.41%  (p=0.000 n=8+10)
Combine/inline_3_one_error          41.6ns ± 3%     5.3ns ± 5%   -87.18%  (p=0.000 n=9+10)
Combine/inline_3_multiple_errors    81.0ns ± 9%   115.8ns ±16%   +42.96%  (p=0.000 n=10+10)
Combine/slice_100_no_errors          432ns ±12%      99ns ± 8%   -77.20%  (p=0.000 n=10+10)
Combine/slice_100_one_error          555ns ±12%     182ns ± 6%   -67.15%  (p=0.000 n=10+9)
Combine/slice_100_multi_error        832ns ± 6%     919ns ± 7%   +10.38%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta
Combine/inline_1                     16.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_2                     32.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_no_error            48.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_one_error           48.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_multiple_errors     80.0B ± 0%     80.0B ± 0%      ~     (all equal)
Combine/slice_100_no_errors         1.79kB ± 0%    0.00kB       -100.00%  (p=0.000 n=10+10)
Combine/slice_100_one_error         1.82kB ± 0%    0.02kB ± 0%   -98.68%  (p=0.000 n=10+10)
Combine/slice_100_multi_error       1.90kB ± 0%    1.90kB ± 0%      ~     (all equal)

name                              old allocs/op  new allocs/op  delta
Combine/inline_1                      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_2                      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_no_error             1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_one_error            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_multiple_errors      2.00 ± 0%      2.00 ± 0%      ~     (all equal)
Combine/slice_100_no_errors           1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/slice_100_one_error           3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
Combine/slice_100_multi_error         7.00 ± 0%      7.00 ± 0%      ~     (all equal)

@CLAassistant
Copy link

CLAassistant commented Feb 28, 2022

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Feb 28, 2022

Codecov Report

Merging #55 (c5a74ae) into master (d49c2ba) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #55   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines          105       112    +7     
=========================================
+ Hits           105       112    +7     
Impacted Files Coverage Δ
error.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d49c2ba...c5a74ae. Read the comment docs.

Comment on lines 105 to 124
for i := 0; i < b.N; i++ {
errs := make([]error, 100)
Combine(errs...)
}
})

b.Run("slice 100 one error", func(b *testing.B) {
for i := 0; i < b.N; i++ {
errs := make([]error, 100)
errs[len(errs)-1] = fmt.Errorf("failed")
Combine(errs...)
}
})

b.Run("slice 100 multi error", func(b *testing.B) {
for i := 0; i < b.N; i++ {
errs := make([]error, 100)
errs[0] = fmt.Errorf("failed1")
errs[len(errs)-1] = fmt.Errorf("failed2")
Combine(errs...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These benchmarks probably shouldn't allocate the test slices inside the loop?
Fixing.

@abhinav
Copy link
Collaborator

abhinav commented Feb 28, 2022

Nice one @jstroem, but I don't think it needs to be this complicated.
The issue is that returning the input errors inside the multierr puts it on the heap.
So we can simplify this change significantly for the same gains by making a copy of errors for that case only.
I tried making the change and here are the results with the updated benchmarks:

name \ time/op                      master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                  17.7ns ± 0%       2.0ns ± 0%         3.0ns ± 0%
Combine/inline_2-8                  21.0ns ± 0%       3.6ns ± 1%         4.1ns ± 1%
Combine/inline_3_no_error-8         24.4ns ± 0%       4.4ns ± 1%         4.7ns ± 1%
Combine/inline_3_one_error-8        24.8ns ± 0%       4.1ns ± 0%         5.1ns ± 1%
Combine/inline_3_multiple_errors-8  44.3ns ± 0%      60.8ns ± 0%        55.1ns ± 1%
Combine/slice_100_no_errors-8       72.9ns ± 0%      71.3ns ± 0%        72.9ns ± 0%
Combine/slice_100_one_error-8       74.5ns ± 0%      73.7ns ± 0%        74.4ns ± 0%
Combine/slice_100_multi_error-8      193ns ± 0%       461ns ± 1%         194ns ± 1%

name \ alloc/op                     master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                   16.0B ± 0%        0.0B               0.0B
Combine/inline_2-8                   32.0B ± 0%        0.0B               0.0B
Combine/inline_3_no_error-8          48.0B ± 0%        0.0B               0.0B
Combine/inline_3_one_error-8         48.0B ± 0%        0.0B               0.0B
Combine/inline_3_multiple_errors-8   80.0B ± 0%       80.0B ± 0%         80.0B ± 0%
Combine/slice_100_no_errors-8        0.00B            0.00B              0.00B
Combine/slice_100_one_error-8        0.00B            0.00B              0.00B
Combine/slice_100_multi_error-8      64.0B ± 0%     1856.0B ± 0%         64.0B ± 0%

name \ allocs/op                    master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                    1.00 ± 0%        0.00               0.00
Combine/inline_2-8                    1.00 ± 0%        0.00               0.00
Combine/inline_3_no_error-8           1.00 ± 0%        0.00               0.00
Combine/inline_3_one_error-8          1.00 ± 0%        0.00               0.00
Combine/inline_3_multiple_errors-8    2.00 ± 0%        2.00 ± 0%          2.00 ± 0%
Combine/slice_100_no_errors-8         0.00             0.00               0.00
Combine/slice_100_one_error-8         0.00             0.00               0.00
Combine/slice_100_multi_error-8       2.00 ± 0%        3.00 ± 0%          2.00 ± 0%

The simplified version performs a nanosecond slower (because it inspects unconditionally—fixable), does the same number of allocations or fewer in the case of "slice of 100 with multiple errors" (because it optimizes for the "no nested multierrors case").

@abhinav
Copy link
Collaborator

abhinav commented Feb 28, 2022

because it inspects unconditionally—fixable

Updated to not inspect for 0 or 1 items. We can special case 2 items as well, if we'd like but it's off by a nanosecond or less so I'm not sure how much worth it is to do that.

Copy link
Contributor

@sywhang sywhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abhinav abhinav merged commit cea7d2e into uber-go:master Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants