Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache size of structs when finding size of slice of structs #1356

Conversation

kwakubiney
Copy link
Member

@kwakubiney kwakubiney commented Feb 21, 2024

#1255

In the current Go binary.Size() implementation, the size of struct types are cached to prevent subsequent reflect based encoding for the same types over and over again but aren't cached when encoding slices of structs. There is an allocation everytime when finding the size of slice of structs. See golang/go#2320.

goos: darwin
goarch: arm64
pkg: github.com/cilium/ebpf/internal/sysenc

                                    │ unmarshal_old.txt │          unmarshal_new.txt           │
                                    │      sec/op       │    sec/op     vs base                │
Unmarshal/[]sysenc.explicitPad-8            59.04n ± 0%   39.86n ±  0%  -32.49% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8         59.93n ± 1%   40.83n ±  8%  -31.88% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8         59.70n ± 0%   40.25n ±  1%  -32.58% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                  77.95n ± 0%   41.12n ±  0%  -47.25% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              169.70n ± 0%   87.14n ±  1%  -48.65% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8               197.8n ± 0%   116.5n ±  0%  -41.08% (p=0.000 n=10)

                                    │ unmarshal_old.txt │            unmarshal_new.txt            │
                                    │       B/op        │    B/op     vs base                     │
Unmarshal/[]sysenc.explicitPad-8           8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8        8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8        8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                 16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              48.00 ± 0%     16.00 ± 0%   -66.67% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8              56.00 ± 0%     24.00 ± 0%   -57.14% (p=0.000 n=10)

                                    │ unmarshal_old.txt │            unmarshal_new.txt            │
                                    │     allocs/op     │ allocs/op   vs base                     │
Unmarshal/[]sysenc.explicitPad-8           1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8        1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8        1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                 2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              5.000 ± 0%     1.000 ± 0%   -80.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8              5.000 ± 0%     1.000 ± 0%   -80.00% (p=0.000 n=10)

goos: darwin
goarch: arm64
pkg: github.com/cilium/ebpf/internal/sysenc
                                  │ marshal_old.txt │           marshal_new.txt           │
                                  │     sec/op      │   sec/op     vs base                │
Marshal/[]sysenc.explicitPad-8         59.46n ±  0%   39.74n ± 0%  -33.16% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      59.85n ± 46%   40.15n ± 1%  -32.92% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      59.47n ±  2%   40.10n ± 1%  -32.57% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               78.43n ±  1%   40.93n ± 0%  -47.81% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8           176.25n ±  1%   95.80n ± 0%  -45.64% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            205.8n ±  0%   123.2n ± 2%  -40.14% (p=0.000 n=10)

                                  │ marshal_old.txt │             marshal_new.txt             │
                                  │      B/op       │    B/op     vs base                     │
Marshal/[]sysenc.explicitPad-8         8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8            64.00 ± 0%     32.00 ± 0%   -50.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            80.00 ± 0%     48.00 ± 0%   -40.00% (p=0.000 n=10)

                                  │ marshal_old.txt │             marshal_new.txt             │
                                  │    allocs/op    │ allocs/op   vs base                     │
Marshal/[]sysenc.explicitPad-8         1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8            6.000 ± 0%     2.000 ± 0%   -66.67% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            6.000 ± 0%     2.000 ± 0%   -66.67% (p=0.000 n=10)

Signed-off-by: kwakubiney <kebiney@hotmail.com>

@kwakubiney kwakubiney changed the title progress: cache size of structs when finding size of slice of structs WIP: cache size of structs when finding size of slice of structs Feb 21, 2024
@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch 2 times, most recently from 4260d20 to f8b0eba Compare February 21, 2024 12:30

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed now

@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch 2 times, most recently from ae1c0ac to 6793a00 Compare February 21, 2024 12:55
@kwakubiney kwakubiney changed the title WIP: cache size of structs when finding size of slice of structs cache size of structs when finding size of slice of structs Feb 21, 2024
@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch 5 times, most recently from 0419ac3 to 0261cb1 Compare February 24, 2024 20:07
@kwakubiney kwakubiney marked this pull request as ready for review February 24, 2024 20:11
@kwakubiney kwakubiney requested a review from a team as a code owner February 24, 2024 20:11
@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch 4 times, most recently from d8285e6 to 318dada Compare February 25, 2024 11:15
@kwakubiney
Copy link
Member Author

kwakubiney commented Feb 27, 2024

i think this optimization must affect batch lookups. Will run some benchmarks

@kwakubiney kwakubiney marked this pull request as draft February 27, 2024 13:41
// For compound structures, it sums the sizes of the elements. Thus, for instance, for a slice
// it returns the length of the slice times the element size and does not count the memory
// occupied by the header. If the type of v is not acceptable, dataSize returns -1.
func dataSize(v reflect.Value) int {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some exceptions, some parts of this PR look very similar to implementations in encoding/binary, e.g.
https://github.com/golang/go/blob/a6a5c30d2b1338a8445de2499fbe7e9dda103efb/src/encoding/binary/binary.go#L480

So my main concerns are that bug fixes or optimizations in this part of the code, that might happen upstream in the future, might be missed here. Would it be possible to get the same results with a wrapper for encoding/binary or even refresh the discussions upstream with new insights?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this package is essentially a fork of the upstream encoding/binary package with slight modifications. Your concerns are valid but I am looking at a makeshift addition for now until this lands upstream (that's if it ever will). I am looking to draft a proposal upstream soon. Thinking about it now, in case the proposal gets rejected, we will be stuck with a forked binary package but it should be relatively easy to switch between packages if need arises

@ti-mo
Copy link
Collaborator

ti-mo commented Feb 28, 2024

Hi @kwakubiney, do you have any statistics on practical memory and CPU time savings? The ones you posted only show the amount of allocs, but that's vague.

I'm not sure if we want to carry a fork of the binary package. I can't see what exactly you've changed to achieve this improvement since it's all in one commit, but @lmb made similar changes upstream to speed up binary.Write iirc. If it's a straightforward patch, we can help you get some eyes on the CL to get it merged faster, and it should ship in the next minor Go release. IMO this is not sufficiently critical to justify carrying a fork of the binary package. (it will end up in importers' vendor/ dirs for a while, etc..)

@kwakubiney
Copy link
Member Author

kwakubiney commented Feb 28, 2024

Hi @kwakubiney, do you have any statistics on practical memory and CPU time savings? The ones you posted only show the amount of allocs, but that's vague.

I'm not sure if we want to carry a fork of the binary package. I can't see what exactly you've changed to achieve this improvement since it's all in one commit, but @lmb made similar changes upstream to speed up binary.Write iirc. If it's a straightforward patch, we can help you get some eyes on the CL to get it merged faster, and it should ship in the next minor Go release. IMO this is not sufficiently critical to justify carrying a fork of the binary package. (it will end up in importers' vendor/ dirs for a while, etc..)

Appreciate the feedback! I have been thinking about it and the concerns you & @florianl raised are valid. This is not critical enough to do away with the upstream encoding/binary package. I'll come up with much more practical benchmarks soon. I will also draft a CL and a proposal and share with the team on Slack so any input can be made before I post it on the Go issue board.

@ti-mo
Copy link
Collaborator

ti-mo commented Mar 4, 2024

@kwakubiney That sounds great, thank you!

@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch 3 times, most recently from 47b9006 to 78d5e09 Compare March 6, 2024 16:51
@kwakubiney
Copy link
Member Author

kwakubiney commented Mar 12, 2024

Issue is upstream. golang/go#66253

Edit: Merged now

In the current Go binary.Size() implementation, the size of struct types
are cached to prevent subsequent reflect based encoding for the same types
over and over again but aren't cached when encoding slices of structs. There
is an allocation everytime when finding the size of structs. See golang/go#2320.

goos: darwin
goarch: arm64
pkg: github.com/cilium/ebpf/internal/sysenc

                                    │ unmarshal_old.txt │          unmarshal_new.txt           │
                                    │      sec/op       │    sec/op     vs base                │
Unmarshal/[]sysenc.explicitPad-8            59.04n ± 0%   39.86n ±  0%  -32.49% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8         59.93n ± 1%   40.83n ±  8%  -31.88% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8         59.70n ± 0%   40.25n ±  1%  -32.58% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                  77.95n ± 0%   41.12n ±  0%  -47.25% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              169.70n ± 0%   87.14n ±  1%  -48.65% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8               197.8n ± 0%   116.5n ±  0%  -41.08% (p=0.000 n=10)

                                    │ unmarshal_old.txt │            unmarshal_new.txt            │
                                    │       B/op        │    B/op     vs base                     │
Unmarshal/[]sysenc.explicitPad-8           8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8        8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8        8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                 16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              48.00 ± 0%     16.00 ± 0%   -66.67% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8              56.00 ± 0%     24.00 ± 0%   -57.14% (p=0.000 n=10)

                                    │ unmarshal_old.txt │            unmarshal_new.txt            │
                                    │     allocs/op     │ allocs/op   vs base                     │
Unmarshal/[]sysenc.explicitPad-8           1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#01-8        1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.explicitPad#02-8        1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc-8                 2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#01-8              5.000 ± 0%     1.000 ± 0%   -80.00% (p=0.000 n=10)
Unmarshal/[]sysenc.struc#02-8              5.000 ± 0%     1.000 ± 0%   -80.00% (p=0.000 n=10)

goos: darwin
goarch: arm64
pkg: github.com/cilium/ebpf/internal/sysenc
                                  │ marshal_old.txt │           marshal_new.txt           │
                                  │     sec/op      │   sec/op     vs base                │
Marshal/[]sysenc.explicitPad-8         59.46n ±  0%   39.74n ± 0%  -33.16% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      59.85n ± 46%   40.15n ± 1%  -32.92% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      59.47n ±  2%   40.10n ± 1%  -32.57% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               78.43n ±  1%   40.93n ± 0%  -47.81% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8           176.25n ±  1%   95.80n ± 0%  -45.64% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            205.8n ±  0%   123.2n ± 2%  -40.14% (p=0.000 n=10)

                                  │ marshal_old.txt │             marshal_new.txt             │
                                  │      B/op       │    B/op     vs base                     │
Marshal/[]sysenc.explicitPad-8         8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8            64.00 ± 0%     32.00 ± 0%   -50.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            80.00 ± 0%     48.00 ± 0%   -40.00% (p=0.000 n=10)

                                  │ marshal_old.txt │             marshal_new.txt             │
                                  │    allocs/op    │ allocs/op   vs base                     │
Marshal/[]sysenc.explicitPad-8         1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#01-8      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.explicitPad#02-8      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc-8               2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Marshal/[]sysenc.struc#01-8            6.000 ± 0%     2.000 ± 0%   -66.67% (p=0.000 n=10)
Marshal/[]sysenc.struc#02-8            6.000 ± 0%     2.000 ± 0%   -66.67% (p=0.000 n=10)

Signed-off-by: kwakubiney <kebiney@hotmail.com>
Signed-off-by: kwakubiney <kebiney@hotmail.com>
@kwakubiney kwakubiney force-pushed the sysenc/reduce-allocations-for-slices-of-structs branch from 78d5e09 to befcd40 Compare March 13, 2024 17:37
Signed-off-by: kwakubiney <kebiney@hotmail.com>
@kwakubiney
Copy link
Member Author

kwakubiney commented Mar 16, 2024

Is it okay to close this now since the change has been merged upstream? @lmb

@lmb
Copy link
Collaborator

lmb commented Mar 20, 2024

Yes, I think that is fine. Thank you for the upstream contribution!

@lmb lmb closed this Mar 20, 2024
@kwakubiney kwakubiney deleted the sysenc/reduce-allocations-for-slices-of-structs branch March 20, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants