Slice: use `reflect.Append` instead of pre-allocating whole slice. #7

gagliardetto · 2022-08-20T13:59:44Z

No description provided.

…scard method

gagliardetto · 2022-08-20T14:01:47Z

@terorie pls take a look next week

riptl · 2022-08-22T09:08:42Z

decoder.go

+
+func reflect_readArrayOfBytes(append bool, d *Decoder, l int, rv reflect.Value) error {
+	for i := 0; i < l; i++ {
+		n, err := d.ReadByte()


We can save a lot of bounds checking here by

doing a bounds check for the entire array

allocating the entire slice

then filling it

Same for all the other primitive arrays and slices

reflect.Append is worse than pre-allocation when the slice size is known because the end result is the same (in terms of memory usage), but the former is less efficient

Two more optimization ideas:

On little endian machines without alignment requirements, no conversion is necessary for integer primitive types like uint16`. We could simply unsafely copy into the backing memory buffer of a slice.

In all simple copy cases, allocating a slice isn't even necessary – The decoder could unsafely construct a slice that points to a place in the serialized buffer

gagliardetto · 2022-08-22T12:14:50Z

Thanks!

Before:

Benchmark_uintSlices_Decode-8     	    6078	    179633 ns/op	   38688 B/op	    1297 allocs/op
Benchmark_uintSlices_append-8     	    3704	    302197 ns/op	   64144 B/op	    2829 allocs/op
Benchmark_uintSlices_allocate-8   	   10000	    103067 ns/op	   16408 B/op	    1026 allocs/op

After:

Benchmark_uintSlices_Decode-8        	   95216	     12386 ns/op	    4256 B/op	       6 allocs/op
Benchmark_uintSlices_append-8        	   13005	     92601 ns/op	   16432 B/op	    1027 allocs/op
Benchmark_uintSlices_preallocate-8   	   12608	     92314 ns/op	   24624 B/op	    1028 allocs/op

(NOTE: the name poorly reflects what happens inside in the AFTER)

gagliardetto · 2022-08-22T12:33:15Z

Okay, so now for slices of uints:

checking if remaining bytes are enough for the desired slice size.
slice of that specific type is pre-allocated.
no reflect is involved in the creation of the slice or the writing to it.
when the slice is ready, it's set upstream using reflect.

gagliardetto · 2022-08-22T12:44:03Z

Ooops, wrong type in the benchmark.

gagliardetto · 2022-08-22T13:00:06Z

Final benchmark:

BEFORE optimizing uint slices (underlying use of reflect.Append)

goos: linux
goarch: amd64
pkg: github.com/gagliardetto/binary
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Benchmark_uintSlice64_Decode_noMake-8         	    3636	    326426 ns/op	  131864 B/op	    5137 allocs/op
Benchmark_uintSlice64_Decode_make-8           	    3724	    308653 ns/op	  131864 B/op	    5137 allocs/op
Benchmark_uintSlice64_Decode_field_noMake-8   	    3643	    302873 ns/op	  132008 B/op	    5143 allocs/op
Benchmark_uintSlice64_Decode_field_make-8     	    3644	    303124 ns/op	  132008 B/op	    5143 allocs/op
Benchmark_uintSlice64_reflect_noMake-8        	   24592	     50456 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice64_reflect_make-8          	   24235	     49264 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice32_Decode_noMake-8         	    4402	    246662 ns/op	  107296 B/op	    4113 allocs/op
Benchmark_uintSlice32_Decode_make-8           	    4700	    251511 ns/op	  107296 B/op	    4113 allocs/op
Benchmark_uintSlice32_Decode_field_noMake-8   	    4592	    253552 ns/op	  107440 B/op	    4119 allocs/op
Benchmark_uintSlice32_Decode_field_make-8     	    4332	    256575 ns/op	  107440 B/op	    4119 allocs/op
Benchmark_uintSlice32_reflect_noMake-8        	  173953	      6631 ns/op	    4192 B/op	       4 allocs/op
Benchmark_uintSlice32_reflect_make-8          	  165073	      6599 ns/op	    4192 B/op	       4 allocs/op
PASS
ok  	github.com/gagliardetto/binary	15.178s

BEFORE optimizing uint slices (underlying use of reflect.MakeSlice + rv.Index(i).Set)

goos: linux
goarch: amd64
pkg: github.com/gagliardetto/binary
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Benchmark_uintSlice64_Decode_noMake-8         	    5041	    244716 ns/op	   90272 B/op	    4102 allocs/op
Benchmark_uintSlice64_Decode_make-8           	    5509	    216908 ns/op	   90272 B/op	    4102 allocs/op
Benchmark_uintSlice64_Decode_field_noMake-8   	    5554	    217488 ns/op	   90416 B/op	    4108 allocs/op
Benchmark_uintSlice64_Decode_field_make-8     	    4992	    227863 ns/op	   90416 B/op	    4108 allocs/op
Benchmark_uintSlice64_reflect_noMake-8        	   22930	     53717 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice64_reflect_make-8          	   23380	     49189 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice32_Decode_noMake-8         	    6961	    166826 ns/op	   73888 B/op	    3078 allocs/op
Benchmark_uintSlice32_Decode_make-8           	    6466	    169598 ns/op	   73888 B/op	    3078 allocs/op
Benchmark_uintSlice32_Decode_field_noMake-8   	    6379	    166451 ns/op	   74032 B/op	    3084 allocs/op
Benchmark_uintSlice32_Decode_field_make-8     	    6475	    170186 ns/op	   74032 B/op	    3084 allocs/op
Benchmark_uintSlice32_reflect_noMake-8        	  179046	      6314 ns/op	    4192 B/op	       4 allocs/op
Benchmark_uintSlice32_reflect_make-8          	  169556	      6537 ns/op	    4192 B/op	       4 allocs/op
PASS
ok  	github.com/gagliardetto/binary	18.773s

AFTER optimizing for uint slices

$ go test  -bench "uintSlice" -benchmem

goos: linux
goarch: amd64
pkg: github.com/gagliardetto/binary
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Benchmark_uintSlice64_Decode_noMake-8         	   22210	     52099 ns/op	   16544 B/op	    1030 allocs/op
Benchmark_uintSlice64_Decode_make-8           	   22062	     54300 ns/op	   16544 B/op	    1030 allocs/op
Benchmark_uintSlice64_Decode_field_noMake-8   	   21668	     54316 ns/op	   16688 B/op	    1036 allocs/op
Benchmark_uintSlice64_Decode_field_make-8     	   22428	     54018 ns/op	   16688 B/op	    1036 allocs/op
Benchmark_uintSlice64_reflect_noMake-8        	   22785	     53440 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice64_reflect_make-8          	   22976	     52906 ns/op	   16480 B/op	    1028 allocs/op
Benchmark_uintSlice32_Decode_noMake-8         	  149244	      7155 ns/op	    4256 B/op	       6 allocs/op
Benchmark_uintSlice32_Decode_make-8           	  160652	      7149 ns/op	    4256 B/op	       6 allocs/op
Benchmark_uintSlice32_Decode_field_noMake-8   	  151970	      7551 ns/op	    4400 B/op	      12 allocs/op
Benchmark_uintSlice32_Decode_field_make-8     	  151761	      7708 ns/op	    4400 B/op	      12 allocs/op
Benchmark_uintSlice32_reflect_noMake-8        	  160108	      7006 ns/op	    4192 B/op	       4 allocs/op
Benchmark_uintSlice32_reflect_make-8          	  154717	      7048 ns/op	    4192 B/op	       4 allocs/op
PASS
ok  	github.com/gagliardetto/binary	17.686s

gagliardetto · 2022-08-24T21:08:52Z

If this looks good, I'll proceed tomorrow to merge. @terorie

gagliardetto · 2022-08-28T13:13:55Z

After simplifying readNBytes:

goos: linux
goarch: amd64
pkg: github.com/gagliardetto/binary
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Benchmark_uintSlice64_Decode_noMake-8                    	  109407	     10240 ns/op	    8352 B/op	       6 allocs/op
Benchmark_uintSlice64_Decode_make-8                      	  104725	     10590 ns/op	    8352 B/op	       6 allocs/op
Benchmark_uintSlice64_Decode_field_noMake-8              	  106695	     10673 ns/op	    8496 B/op	      12 allocs/op
Benchmark_uintSlice64_Decode_field_make-8                	  109975	     11053 ns/op	    8496 B/op	      12 allocs/op
Benchmark_uintSlice64_readArray_noMake-8                 	  109645	     10126 ns/op	    8288 B/op	       4 allocs/op
Benchmark_uintSlice64_readArray_make-8                   	  104791	     10162 ns/op	    8288 B/op	       4 allocs/op
Benchmark_uintSlice64_Decode_field_withCustomDecoder-8   	  110866	     10326 ns/op	    8240 B/op	       2 allocs/op
Benchmark_uintSlice32_Decode_noMake-8                    	  162374	      7324 ns/op	    4256 B/op	       6 allocs/op
Benchmark_uintSlice32_Decode_make-8                      	  140947	      7607 ns/op	    4256 B/op	       6 allocs/op
Benchmark_uintSlice32_Decode_field_noMake-8              	  130076	      7981 ns/op	    4400 B/op	      12 allocs/op
Benchmark_uintSlice32_Decode_field_make-8                	  127812	      8376 ns/op	    4400 B/op	      12 allocs/op
Benchmark_uintSlice32_readArray_noMake-8                 	  150466	      7251 ns/op	    4192 B/op	       4 allocs/op
Benchmark_uintSlice32_readArray_make-8                   	  160896	      6962 ns/op	    4192 B/op	       4 allocs/op
Benchmark_uintSlice32_Decode_field_withCustomDecoder-8   	  151675	      7046 ns/op	    4144 B/op	       2 allocs/op

gagliardetto added 4 commits August 20, 2022 15:56

Decoding slices: use reflect.Append instead of allocating slice.

5cb550a

Fix borsh test

d8654f8

readNBytes: add limit checks; ReadRustString: add limit check; add Di…

4dc4569

…scard method

Add tests for go 1.19.x

d94c910

gagliardetto added 3 commits August 20, 2022 17:04

More tests

c0cb6ac

Add specialized uint array/slice encoders

1ff28ef

Add optimized uint array/slice encoders

a8f0149

riptl reviewed Aug 22, 2022

View reviewed changes

Reduce usage of reflect

179e9d3

Make tests more uniform

90ac7a1

More tests

e6ae947

gagliardetto added 2 commits August 22, 2022 14:52

Improve benchmarks

06d42a1

Add benchmark for uint32 slices/arrays

df95dbc

gagliardetto added 3 commits August 22, 2022 15:00

Add missing checks to benchmarks

5788bc1

Add bench for custom decoder

76b2308

Cleanup

b4927dd

gagliardetto merged commit 2b26380 into master Aug 28, 2022

GoVulnBot mentioned this pull request Sep 2, 2022

x/vulndb: potential Go vuln in github.com/gagliardetto/binary: CVE-2022-36078 golang/vulndb#963

Closed

gagliardetto deleted the slice-append branch October 2, 2022 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slice: use `reflect.Append` instead of pre-allocating whole slice. #7

Slice: use `reflect.Append` instead of pre-allocating whole slice. #7

gagliardetto commented Aug 20, 2022

gagliardetto commented Aug 20, 2022

riptl Aug 22, 2022

riptl Aug 22, 2022

gagliardetto commented Aug 22, 2022 •

edited

gagliardetto commented Aug 22, 2022 •

edited

gagliardetto commented Aug 22, 2022

gagliardetto commented Aug 22, 2022 •

edited

gagliardetto commented Aug 24, 2022

gagliardetto commented Aug 28, 2022

Slice: use reflect.Append instead of pre-allocating whole slice. #7

Slice: use reflect.Append instead of pre-allocating whole slice. #7

Conversation

gagliardetto commented Aug 20, 2022

gagliardetto commented Aug 20, 2022

riptl Aug 22, 2022

Choose a reason for hiding this comment

riptl Aug 22, 2022

Choose a reason for hiding this comment

gagliardetto commented Aug 22, 2022 • edited

gagliardetto commented Aug 22, 2022 • edited

gagliardetto commented Aug 22, 2022

gagliardetto commented Aug 22, 2022 • edited

BEFORE optimizing uint slices (underlying use of reflect.Append)

BEFORE optimizing uint slices (underlying use of reflect.MakeSlice + rv.Index(i).Set)

AFTER optimizing for uint slices

gagliardetto commented Aug 24, 2022

gagliardetto commented Aug 28, 2022

Slice: use `reflect.Append` instead of pre-allocating whole slice. #7

Slice: use `reflect.Append` instead of pre-allocating whole slice. #7

gagliardetto commented Aug 22, 2022 •

edited

gagliardetto commented Aug 22, 2022 •

edited

gagliardetto commented Aug 22, 2022 •

edited