Optimize Combine for all nil scenarios #55

jstroem · 2022-02-28T13:28:57Z

During the work on yarpc/yarpc-go#2126 I found that multierr.Combine always allocates the error slice on the heap because of the escape analysis.

Assuming that most cases where multierr.Combine is called all arguments are nil it make sense to optimize it with that in mind:

Benchmark results for this optimization:

name                              old time/op    new time/op    delta
Combine/inline_1                    29.3ns ±15%     2.0ns ± 7%   -93.28%  (p=0.000 n=10+9)
Combine/inline_2                    40.0ns ± 6%     3.4ns ±11%   -91.55%  (p=0.000 n=10+10)
Combine/inline_3_no_error           41.9ns ± 2%     4.8ns ±15%   -88.41%  (p=0.000 n=8+10)
Combine/inline_3_one_error          41.6ns ± 3%     5.3ns ± 5%   -87.18%  (p=0.000 n=9+10)
Combine/inline_3_multiple_errors    81.0ns ± 9%   115.8ns ±16%   +42.96%  (p=0.000 n=10+10)
Combine/slice_100_no_errors          432ns ±12%      99ns ± 8%   -77.20%  (p=0.000 n=10+10)
Combine/slice_100_one_error          555ns ±12%     182ns ± 6%   -67.15%  (p=0.000 n=10+9)
Combine/slice_100_multi_error        832ns ± 6%     919ns ± 7%   +10.38%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta
Combine/inline_1                     16.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_2                     32.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_no_error            48.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_one_error           48.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_multiple_errors     80.0B ± 0%     80.0B ± 0%      ~     (all equal)
Combine/slice_100_no_errors         1.79kB ± 0%    0.00kB       -100.00%  (p=0.000 n=10+10)
Combine/slice_100_one_error         1.82kB ± 0%    0.02kB ± 0%   -98.68%  (p=0.000 n=10+10)
Combine/slice_100_multi_error       1.90kB ± 0%    1.90kB ± 0%      ~     (all equal)

name                              old allocs/op  new allocs/op  delta
Combine/inline_1                      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_2                      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_no_error             1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_one_error            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/inline_3_multiple_errors      2.00 ± 0%      2.00 ± 0%      ~     (all equal)
Combine/slice_100_no_errors           1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Combine/slice_100_one_error           3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
Combine/slice_100_multi_error         7.00 ± 0%      7.00 ± 0%      ~     (all equal)

CLAassistant · 2022-02-28T13:29:05Z

All committers have signed the CLA.

codecov · 2022-02-28T15:34:15Z

Codecov Report

Merging #55 (c5a74ae) into master (d49c2ba) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #55   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines          105       112    +7     
=========================================
+ Hits           105       112    +7

Impacted Files	Coverage Δ
error.go	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d49c2ba...c5a74ae. Read the comment docs.

abhinav · 2022-02-28T15:49:02Z

benchmarks_test.go

+		for i := 0; i < b.N; i++ {
+			errs := make([]error, 100)
+			Combine(errs...)
+		}
+	})
+
+	b.Run("slice 100 one error", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			errs := make([]error, 100)
+			errs[len(errs)-1] = fmt.Errorf("failed")
+			Combine(errs...)
+		}
+	})
+
+	b.Run("slice 100 multi error", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			errs := make([]error, 100)
+			errs[0] = fmt.Errorf("failed1")
+			errs[len(errs)-1] = fmt.Errorf("failed2")
+			Combine(errs...)


These benchmarks probably shouldn't allocate the test slices inside the loop?
Fixing.

abhinav · 2022-02-28T16:12:24Z

Nice one @jstroem, but I don't think it needs to be this complicated.
The issue is that returning the input errors inside the multierr puts it on the heap.
So we can simplify this change significantly for the same gains by making a copy of errors for that case only.
I tried making the change and here are the results with the updated benchmarks:

name \ time/op                      master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                  17.7ns ± 0%       2.0ns ± 0%         3.0ns ± 0%
Combine/inline_2-8                  21.0ns ± 0%       3.6ns ± 1%         4.1ns ± 1%
Combine/inline_3_no_error-8         24.4ns ± 0%       4.4ns ± 1%         4.7ns ± 1%
Combine/inline_3_one_error-8        24.8ns ± 0%       4.1ns ± 0%         5.1ns ± 1%
Combine/inline_3_multiple_errors-8  44.3ns ± 0%      60.8ns ± 0%        55.1ns ± 1%
Combine/slice_100_no_errors-8       72.9ns ± 0%      71.3ns ± 0%        72.9ns ± 0%
Combine/slice_100_one_error-8       74.5ns ± 0%      73.7ns ± 0%        74.4ns ± 0%
Combine/slice_100_multi_error-8      193ns ± 0%       461ns ± 1%         194ns ± 1%

name \ alloc/op                     master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                   16.0B ± 0%        0.0B               0.0B
Combine/inline_2-8                   32.0B ± 0%        0.0B               0.0B
Combine/inline_3_no_error-8          48.0B ± 0%        0.0B               0.0B
Combine/inline_3_one_error-8         48.0B ± 0%        0.0B               0.0B
Combine/inline_3_multiple_errors-8   80.0B ± 0%       80.0B ± 0%         80.0B ± 0%
Combine/slice_100_no_errors-8        0.00B            0.00B              0.00B
Combine/slice_100_one_error-8        0.00B            0.00B              0.00B
Combine/slice_100_multi_error-8      64.0B ± 0%     1856.0B ± 0%         64.0B ± 0%

name \ allocs/op                    master.txt   original_pr.txt  simplified_pr.txt
Combine/inline_1-8                    1.00 ± 0%        0.00               0.00
Combine/inline_2-8                    1.00 ± 0%        0.00               0.00
Combine/inline_3_no_error-8           1.00 ± 0%        0.00               0.00
Combine/inline_3_one_error-8          1.00 ± 0%        0.00               0.00
Combine/inline_3_multiple_errors-8    2.00 ± 0%        2.00 ± 0%          2.00 ± 0%
Combine/slice_100_no_errors-8         0.00             0.00               0.00
Combine/slice_100_one_error-8         0.00             0.00               0.00
Combine/slice_100_multi_error-8       2.00 ± 0%        3.00 ± 0%          2.00 ± 0%

The simplified version performs a nanosecond slower (because it inspects unconditionally—fixable), does the same number of allocations or fewer in the case of "slice of 100 with multiple errors" (because it optimizes for the "no nested multierrors case").

abhinav · 2022-02-28T16:20:47Z

because it inspects unconditionally—fixable

Updated to not inspect for 0 or 1 items. We can special case 2 items as well, if we'd like but it's off by a nanosecond or less so I'm not sure how much worth it is to do that.

sywhang

jstroem added 2 commits February 28, 2022 13:59

Optimize Combine for all nil scenarios

ad162f4

optimize for one error

3d2b5ea

jstroem added 2 commits February 28, 2022 14:34

Removed unused code

6acb82d

cleanup allocation tests

22c7808

abhinav added 2 commits February 28, 2022 07:49

bench: Don't allocate slices inside the loop

9a9809b

Simplify solution

2ca2005

abhinav reviewed Feb 28, 2022

View reviewed changes

Don't inspect slices of 0 or 1

b63bdaa

abhinav added 2 commits February 28, 2022 08:28

test/cover: Slice with a single non-nil error

5842d36

better explain the comment about copying the slice

c5a74ae

sywhang approved these changes Feb 28, 2022

View reviewed changes

abhinav merged commit cea7d2e into uber-go:master Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Combine for all nil scenarios #55

Optimize Combine for all nil scenarios #55

jstroem commented Feb 28, 2022

CLAassistant commented Feb 28, 2022 •

edited

Loading

codecov bot commented Feb 28, 2022 •

edited

Loading

abhinav Feb 28, 2022

abhinav commented Feb 28, 2022 •

edited

Loading

abhinav commented Feb 28, 2022

sywhang left a comment

Optimize Combine for all nil scenarios #55

Optimize Combine for all nil scenarios #55

Conversation

jstroem commented Feb 28, 2022

CLAassistant commented Feb 28, 2022 • edited Loading

codecov bot commented Feb 28, 2022 • edited Loading

Codecov Report

abhinav Feb 28, 2022

Choose a reason for hiding this comment

abhinav commented Feb 28, 2022 • edited Loading

abhinav commented Feb 28, 2022

sywhang left a comment

Choose a reason for hiding this comment

CLAassistant commented Feb 28, 2022 •

edited

Loading

codecov bot commented Feb 28, 2022 •

edited

Loading

abhinav commented Feb 28, 2022 •

edited

Loading