Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for kustomize build #5084

Open
1 task done
chlunde opened this issue Mar 8, 2023 · 8 comments
Open
1 task done

Performance improvements for kustomize build #5084

chlunde opened this issue Mar 8, 2023 · 8 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chlunde
Copy link
Contributor

chlunde commented Mar 8, 2023

We have a configrepo which produces about 4000 kubernetes resources, and expect three times that number in september. kustomize build currently takes around 45 seconds on a developer laptop. We run kustomize build against master and a branch for CI to find the impact of a change, and flux also runs kustomize build. In CI this takes about 1.5 minutes, so in total we see 7.5 minutes of compute time to roll out one change.

Since this is just under 1.4 MB of YAML I believe it should be possible to do this work much faster.

I have made four PRs with performance improvements. I have made the PRs as small as possible, but I believe some changes might be nicer with larger refactorings to make the new code more resilient to changes (resWrangler id map, PR #5081 ). 5081+5082 might also need more tests, please provide feedback in each PR with suggested improvements as I don't know the kustomize codebase.

Here's a summary of the proposed changes:

pprof before changes:

image

after it is mostly YAML parsing and GC.

Why is this needed?

Faster linting of PRs, quicker reconcile in flux

Can you accomplish the motivating task without this feature, and if so, how?

Splitting into smaller repos might help, but it will not allow us to analyze the whole service mesh graph and interactions between services/configurations.

What other solutions have you considered?

N/A

Anything else we should know?

No response

Feature ownership

  • I am interested in contributing this feature myself! 🎉
@chlunde chlunde added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 8, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 8, 2023
@natasha41575
Copy link
Contributor

We will happily accept performance improvements. Since you've already opened some PRs, we can discuss specific changes on those PRs.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 22, 2023
@natasha41575
Copy link
Contributor

Apologies for the delay - we will try to have people review your PRs soon!

chlunde added a commit to chlunde/kustomize that referenced this issue Oct 30, 2023
This change introduces a benchmarking test that constructs a
complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:
* Demonstrating current performance challenges in Kustomize in a reproducible manner.
* Evaluating the effects of performance enhancements.
* Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
* Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

    go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

    # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test
    pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
    BenchmarkBuild-8   	       1	8523677542 ns/op
    PASS
    ok  	sigs.k8s.io/kustomize/kustomize/v5/commands/build	8.798s

*Currently*, this benchmark requires 3000 seconds to run on my machine.
In order to run it on master today, you need to add `-timeout=30m` to the
`go test` command.

The dataset size was chosen because I believe it represents a real workload
which we could get a runtime of less than 10 seconds.

Updates kubernetes-sigs#5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

> Will PGO with an unrepresentative profile make my program slower than no PGO?
> It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

    go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

    go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
    go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

    ./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
    ./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total
chlunde added a commit to chlunde/kustomize that referenced this issue Nov 2, 2023
This change introduces a benchmarking test that constructs a
complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:
* Demonstrating current performance challenges in Kustomize in a reproducible manner.
* Evaluating the effects of performance enhancements.
* Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
* Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

    go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

    # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test
    pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
    BenchmarkBuild-8   	       1	8523677542 ns/op
    PASS
    ok  	sigs.k8s.io/kustomize/kustomize/v5/commands/build	8.798s

*Currently*, this benchmark requires 3000 seconds to run on my machine.
In order to run it on master today, you need to add `-timeout=30m` to the
`go test` command.

The dataset size was chosen because I believe it represents a real workload
which we could get a runtime of less than 10 seconds.

Updates kubernetes-sigs#5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

> Will PGO with an unrepresentative profile make my program slower than no PGO?
> It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

    go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

    go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
    go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

    ./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
    ./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total
@ephesused
Copy link
Contributor

Another thought here would be to join the lookups done by Resource.CurId(). It calls Resource.GetGvk (which calls RNode.GetApiVersion and RNode.GetKind), Resource.GetName, and Resource.GetNamespace.

That means there are four independent traversals at the top level (apiVersion, kind, and metadata twice). Then, in metadata, there are two independent traversals (name and namespace).

This flow could be optimized for performance so it would have a single execution to find apiVersion, kind, and metadata, and then a single execution to find name and namespace within metadata.

chlunde added a commit to chlunde/kustomize that referenced this issue Nov 3, 2023
This change introduces a benchmarking test that constructs a
complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:
* Demonstrating current performance challenges in Kustomize in a reproducible manner.
* Evaluating the effects of performance enhancements.
* Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
* Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

    go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

    # sigs.k8s.io/kustomize/kustomize/v5/commands/build.test
    pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
    BenchmarkBuild-8   	       1	8523677542 ns/op
    PASS
    ok  	sigs.k8s.io/kustomize/kustomize/v5/commands/build	8.798s

*Currently*, this benchmark requires 3000 seconds to run on my machine.
In order to run it on master today, you need to add `-timeout=30m` to the
`go test` command.

The dataset size was chosen because I believe it represents a real workload
which we could get a runtime of less than 10 seconds.

Updates kubernetes-sigs#5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

> Will PGO with an unrepresentative profile make my program slower than no PGO?
> It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

    go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

    go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
    go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

    ./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
    ./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total
@ephesused
Copy link
Contributor

Another thought here would be to join the lookups done by Resource.CurId().

I took a little time to investigate this option, and the improvement was minor at best - not really worth the effort.

I've started to investigate what might be possible with caching CurId() in the Resource. Given that caching isn't already in place, I'm a little worried about complexities I may find with cache invalidation. I had hoped to lean on the code path that updates the list of previous ids, but there appear to be gaps (for example, api/krusty/component_test.go's TestComponent/multiple-components fails - I don't yet know if that indicates a flaw in the existing code where the previous id list should be updated but is not, or if that indicates a flaw in my hope that any change in CurId() should be associated with an update to the list of previous ids). I will continue investigating.

@ephesused
Copy link
Contributor

Initial results for caching the Resource.CurId() return value are very promising. I hooked cache invalidation into Resource.setPreviousId() and resWrangler.appendReplaceOrMerge()'s case 1 for replace and merge, and that looks to cover the unit test cases. Note that there are a small number of unit tests that cannot run cleanly on my system, so I may have gaps there.

@natasha41575 (and others), before I consider moving forward with this change, do you know if there are reasons why caching Resource.CurId() could be problematic? I feel like there may be hidden pitfalls here. Is this sort of caching in line with kustomize coding patterns? In addition to resWrangler.appendReplaceOrMerge, are there other spots that might adjust a resource in a way that could alter its ResId, but do not issue a call to Resource.setPreviousId()? Anything other aspect I might be missing?

I did some testing using the benchmark from #5425. However, I didn't want to wait forever so I adjusted the second level resource count from 100 down to 20.

$ git log --oneline -1
e219b8864 (HEAD -> master, upstream/master, origin/master, origin/HEAD) Merge pull request #5421 from prashantrewar/swap-yaml-library

$ go test ./kustomize/commands/build -run nope -bench BenchmarkBuild -benchmem
goos: linux
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-2               1        149546683100 ns/op      2276899072 B/op 21421892 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       149.598s

$ git checkout optimize-curid
M       go.work.sum
Switched to branch 'optimize-curid'
Your branch is ahead of 'master' by 1 commit.
  (use "git push" to publish your local commits)

$ go test ./kustomize/commands/build -run nope -bench BenchmarkBuild -benchmem
goos: linux
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-2               1        12183853800 ns/op       2280974424 B/op 21462373 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       12.236s

$

@ephesused
Copy link
Contributor

Initial results for caching the Resource.CurId() return value are very promising.

I went ahead and created #5481 for this effort. I still have some concerns about what other code paths might need to issue the call to invalidate the ID caches, but after discussion in #5422 (comment) I figured it was worth sharing the work. I don't know of any other spots in the code that would need the additions, so there's not much benefit in me keeping the PR private.

@shapirus
Copy link

shapirus commented Dec 8, 2023

So far, no performance changes against v5.3.0 (multiple invocations scenario):

Starting kustomize benchmark on Linux x86_64
kustomize versions: 
  5.2.1
  5.3.0
  PR-5481
iterations per test: 200
tests: 
  1_no-patches
  2_patches-json6902
  3_patches-strategic-merge
  4_no-patches-unknown-kind
  5_component-no-base-no-patches
  6_component-json6902-over-base
  7_component-PSM-over-base
time unit: seconds

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v5.2.1      1.37      1.52     11.96      1.37      2.25      3.10     13.40
    v5.3.0      1.33      1.47     12.29      1.34      1.54      1.92     12.34
   PR-5481      1.37      1.46     12.19      1.30      1.55      1.81     12.29

@ephesused
Copy link
Contributor

@shapirus, I'm not surprised #5481 left the PSM performance (#5422) as-is. The optimization in #5481 was aimed at #5425, and in that context it has dramatic improvements. If I can carve some time in the next few weeks, I'll take a close look at #5422 and update there with any useful information I can find.

chlunde added a commit to chlunde/kustomize that referenced this issue Apr 16, 2024
This change introduces a benchmarking test that constructs a
complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:
* Demonstrating current performance challenges in Kustomize in a reproducible manner.
* Evaluating the effects of performance enhancements.
* Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
* Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

    go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

    pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
    BenchmarkBuild-8               1        48385043792 ns/op
    PASS
    ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       48.701s

*Currently*, this benchmark requires 48 seconds to run on my machine.

Updates kubernetes-sigs#5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

> Will PGO with an unrepresentative profile make my program slower than no PGO?
> It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

    go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

    go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
    go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

    ./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
    ./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total
chlunde added a commit to chlunde/kustomize that referenced this issue Apr 16, 2024
This change introduces a benchmarking test that constructs a
complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:
* Demonstrating current performance challenges in Kustomize in a reproducible manner.
* Evaluating the effects of performance enhancements.
* Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
* Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

    $ make run-benchmarks
    go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

    pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
    BenchmarkBuild-8               1        48035946042 ns/op
    PASS
    ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       48.357s

*Currently*, this benchmark requires 48 seconds to run on my machine.

Updates kubernetes-sigs#5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

> Will PGO with an unrepresentative profile make my program slower than no PGO?
> It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

    go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

    go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
    go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

    ./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
    ./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants