Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openapi parsing performance improvements #4569

Open
natasha41575 opened this issue Apr 5, 2022 · 24 comments
Open

Openapi parsing performance improvements #4569

natasha41575 opened this issue Apr 5, 2022 · 24 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@natasha41575
Copy link
Contributor

natasha41575 commented Apr 5, 2022

OpenAPI parsing is a huge performance bottleneck for us. Related issues:

We discovered with these benchmarks that parsing protobuf is much faster than parsing JSON OpenAPI schemas. kube-openapi now also supports a way to convert from the Gnostic v2 types to spec.Swagger types.

This means that instead of parsing the JSON to spec.Swagger types, which is what we do today, we can instead parse a proto schema to Gnostic v2 types, and then convert the Gnostic v2 to spec.Swagger.

Proto->gnostic-swagger takes ~39 ms.
Parsing JSON takes ~600 ms.

Proposal

  1. We should use protobuffer format instead of JSON for our builtin OpenAPI. openapi parsing performance improvement with protobuffer #4568 is a complete implementation of this.

  2. Right now, if users have their own custom schema files, they provide them via the openapi field. However, the custom schema replaces the builtin schema entirely, so the user has to provide the entire schema themselves. Since they will typically provide the custom schema in JSON or YAML, they will not get the performance improvements of the builtin proto schema. To fix this, we can have the custom schema provided via the openapi field be supplementary to what is builtin, rather than replacing it completely. In this workflow, the users get two benefits:

  • they will only have to provide schemas for their custom types
  • they will still get the huge performance boost of kustomize's builtin openapi in proto format even when they have custom resource schemas defined in JSON
  1. We should add a --proto option to kustomize openapi fetch and also allow proto-formatted schemas in the openapi field. This would look something like:
openapi:
  path: my-proto.pb
@natasha41575 natasha41575 added kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 5, 2022
@natasha41575 natasha41575 self-assigned this Apr 5, 2022
@natasha41575
Copy link
Contributor Author

cc @KnVerey

@natasha41575 natasha41575 changed the title Openapi parsing improvements Openapi parsing performance improvements Apr 5, 2022
@shapirus
Copy link

shapirus commented Jun 10, 2022

Chiming in with my tests :)

Version 4.5.5 has much better performance than some older versions mentioned in the previous tickets on this matter (5-7 times the speed, I would say, compared to versions around 3.9.4 or so), but it's still two times slower than 3.5.4. I've run a quick comparison on our manifests directory, containing a total of 460 kustomization.yaml files:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v3.5.4 build \; &>/dev/null)

real    0m16.606s
user    0m24.730s
sys     0m3.704s

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v4.5.5 build \; &>/dev/null)

real    0m31.513s
user    0m58.568s
sys     0m7.938s

Just like before, it slows down when it encounters "unknown" resource kinds, such as SealedSecret, or actually any random string.

Interestingly enough, however, per my tests, v4.5.5 is about 2.5 times faster than v3.5.4, if only known resource kinds are used: I performed a quick test on a simple set of manifests containing only a Deployment and a Service.

But once I add a SealedSecret there (still with v4.5.5), it becomes 5 times slower.

The 2.5x performance improvement over v3.5.4 is awesome, I wonder if it's possible to retain it for the case where arbitrary resource kinds are present as well.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2022
@KnVerey
Copy link
Contributor

KnVerey commented Sep 12, 2022

/remove-lifecycle stale

We still need this part:

We should add a --proto option to kustomize openapi fetch and also allow proto-formatted schemas in the openapi field.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2022
@shapirus
Copy link

/remove-lifecycle stale

this won't just go away if this ticket is closed automatically.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2022
@shapirus
Copy link

A gentle reminder that the issue is still there. In our case, this is a definite blocker preventing us from upgrading from approximately v3.5.4 to anything more recent.

v3.5.4, 17.8s:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v3.5.4 build \; &>/dev/null)

real    0m17.773s
user    0m26.537s
sys     0m5.160s

v5.0.0, the latest at the time of writing this, 28.5s, which is 1.6 times slower than v3.5.4:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v5.0.0 build \; &>/dev/null)

real    0m28.547s
user    0m53.858s
sys     0m6.810s

There was a big performance improvement at some point after v3.5.4 in processing the resources of "known" Kinds (e.g., Deployment, Service, etc.), and it is a pity that it remains useless, because it gets totally ruined by the poor performance of processing "unknown" Kinds such as e.g. SealedSecret.

@KnVerey
Copy link
Contributor

KnVerey commented Feb 15, 2023

@shapirus thank you for posting your results with the lastest version
cc @ephesused -- perhaps this issue would interest you, as someone who has helped a lot with Kustomize performance recently

@ephesused
Copy link
Contributor

Thanksk @KnVerey. I'm happy to take a look. I'll pick this up for investigation and if I don't think I'm in position to address it, I'll ask to hand it off.

@shapirus, if possible, will you please provide your test case or a simplified version that we can use for analysis? Thanks.

/assign

@shapirus
Copy link

Sure, I will, tomorrow.

@shapirus
Copy link

shapirus commented Feb 16, 2023

@ephesused here you go. Attached is a set of manifests that demonstrate the problem and a Dockerfile (written for linux/x86_64; for other OS/arch you'll need to adjust the package manager commands and the kustomize url template) to build and run the tests conveniently. The list of versions to test is set in Dockerfile:5. It runs 300 iterations per each combination of kustomize versions and "known" (Service) and "unknown" (SealedSecret) kinds and measures the time it takes.

In a single command:

docker run $(docker build -q .)

kustomize-performance-test.zip

@ephesused
Copy link
Contributor

Thanks, @shapirus. I hope to be able to start investigating early next week. I'll post here as I go.

@ephesused
Copy link
Contributor

TL;DR: I plan to submit a PR to adjust something simple (gvk.ApiVersion) that should help. However, the PR does not address the issue of openapi parsing, so this issue will remain open past the initial PR.


I dug around with some profiling to see what I could uncover.

As a start, it's worth noting that the test case is set up with only a single resource. That highlights the openapi parsing issue well, but (from what I can tell) the openapi parsing is a one-time expense. So the test case effectively is a worst-case scenario for the openapi side of things. That's fine, but I think it may put the focus on a misleading source of the issue.

To get execution in a range where I could profile with a little bit of confidence, I adjusted the test cases so they have 2000 resources (each identical to the original but with a different name).

The known full run uses a total of 570MB. The unknown full run uses a total of 1219MB, a difference of 649MB. The samples for memory were over 94% of the total in both cases, so the sampling is a good representation.

The known gvk is a v1 Secret, the unknown one is a bitnami.com/v1alpha1 SealedSecret.

The known run uses 121MB for resid.NewGVK. The unknown one uses 740MB for resid.NewGVK. That's a difference of 619MB, accounting for the vast majority of the difference for the full run.

For the known, all of the 121MB comes through strings.WriteString building the apiVersion ("v1").

For the unknown, 699MB of the 739MB comes through strings.WriteString building the apiVersion ("bitnami.com/v1alpha1"). This amount expands and contracts based on the number of resources. The other 41MB comes through openapi.initSchema. The openapi.initSchema amount doesn't seem to be affected by the resource count.

resource.CurId (resource.go line 447) calls resource.GetGvk (resource.go line 56) which calls resid.GvkFromNode (gvk.go line 31) which calls...

  • resid.NewGVK (gvk.go line 24) which calls...
    • gvk.AsTypeMeta (gvk.go line 247) which calls...
      • gvk.ApiVersion (gvk.go line 124) which calls...
        • strings.WriteString
    • openapi.IsCertainlyClusterScoped (openapi.go line 406) which calls...
      • openapi.IsNamespaceScoped (openapi.go line 389) - the "v1" is recognized as a precomputed known type and returns; the "bitnami.com/v1alpha1" is not, which leads to a call into...
        • openapi.isNamespaceScopedFromSchema (openapi.go line 396) which calls...
          • openapi.initSchema (openapi.go line 625)

The known full run took 5.79s. The unknown full run took 7.05s, a difference of 1.26s. The sampling for cpu time was about 90% for both.

For what was sampled, the known run used 2.99s for resource.CurId. The unknown run used 3.85s for resource.CurId, a difference of 0.86s. The unknown run used 0.18s more for garbage collection (runtime.gcBcMarkWorker). The unknown run used 0.48s more for openapi.IsNamespaceScoped, 0.32s of which was openapi.initSchema.

Currently gvk.ApiVersion uses strings.Builder to create its output. Since the string construction is very simple, I don't think strings.Builder is the best approach. When I created a benchmark to test its current execution, here's what happens for bitnami.com/v1alpha1:

BenchmarkApiVersion/5-8         16658961                67.41 ns/op           48 B/op          2 allocs/op

With the following gvk.ApiVersion code, the execution runs in 1/3 the time and avoids allocations:

	if x.Group != "" {
		return x.Group + "/" + x.Version
	}
	return x.Version
BenchmarkApiVersion/5-8         57177163                21.71 ns/op            0 B/op          0 allocs/op

So my thinking is that the first step is to adjust gvk.ApiVersion. That should make its execution quicker and should reduce load on garbage collection - which should help overall runtime.

Profile results

Known use case, CPU

known use case cpu

Unknown use case, CPU

unknown use case cpu

Known use case, memory

known use case memory

Unknown use case, memory

unknown use case memory

@shapirus
Copy link

A note regarding this:

it's worth noting that the test case is set up with only a single resource.

it is actually a close representation of our use case: hundreds of small sets of resources, each having its own kustomization.yaml (microservice architecture where each microservice is additionally split into variants) for each of which a new invocation of kustomize is done. This is why this performance degradation became so very well noticeable in our situation.

@ephesused
Copy link
Contributor

I follow, @shapirus. To be clear, I don't intend to stop with the initial PR. That memory use jumped out at me when looking at the profile results, and it's a quick thing to address. I figured I might as well get that in place, with the hopes that it might improve things on your end at least somewhat.

@shapirus
Copy link

Yeah sure. It looks like it's going to be an iterative process either way. Good thing is that we know how to reproduce it, so we have a way of measuring exactly how much of a difference a particular change makes.

@ephesused
Copy link
Contributor

FWIW... In my testing with @shapirus's test scenario, #5808 yields about a 10% runtime improvement over v5.0.0 for both known and unknown resources.

@ephesused
Copy link
Contributor

ephesused commented Feb 28, 2023

Sorry, other commitments have claimed my time recently. I did do some further analysis, but haven't had time to dig and come up with ideas for improvement. I hope to have more time later this week.

I set up a test case that ran 100 executions of the one-resource known & unknown test cases (resetting the global "already initialized" flag each time, to mimic the user experience of running kustomize fresh each time). Below, kn stands for known and uk stands for unknown.

The cpu run-time results for unknown resources are more than three times that of the known resources. That's meaningfully worse than what @shapirus noted previously in this issue. The memory differences might look surprising at first, but they are what I expected based on earlier analysis (each openapi initialization requires a little more than 40MB).

CPU
kn-cpu-master-976193ce7.svg: Duration: 2.22s, Total samples = 1.51s (67.88%)
uk-cpu-master-976193ce7.svg: Duration: 7.25s, Total samples = 6.76s (93.26%)

Memory
kn-mem-master-976193ce7.svg: Showing nodes accounting for 96.96MB, 65.62% of 147.76MB total
uk-mem-master-976193ce7.svg: Showing nodes accounting for 3.92GB, 88.04% of 4.45GB total

Known resource cpu usage
known resource cpu usage

Unknown resource cpu usage
unknown resource cpu usage

Known resource memory usage
known resource memory usage

Unknown resource memory usage
unknown resource memory usage

@ephesused
Copy link
Contributor

ephesused commented Mar 2, 2023

After digging around, I have a possible approach in mind. The work done in #4152 ensures the precomputed namespace scoped information aligns with the default schema. The implication is that anything not found in the precomputed information is not in the default schema. Knowing that, there's no point in loading the default schema when trying to determine namespace scoping. In that case, if it's not precomputed, we know that the typeMeta won't be found and that openapi.isNamespaceScopedFromSchema must return false, false.

So the idea is to put a short-circuit in isNamespaceScopedFromSchema openapi.IsNamespaceScoped when the schema is the default.

@ephesused
Copy link
Contributor

Just a quick note to say that while I expect this PR to be small, I haven't been able to carve the time to do the coding. I intend to do so within the next few days.

@shapirus
Copy link

...lots of failing checks on that PR :)

@ephesused
Copy link
Contributor

...lots of failing checks on that PR :)

Indeed. It turned out to be more complex than I'd expected, and I haven't had a chance to come back around and see if/how I can make it work properly.

(I try to ensure things are in solid shape before submission, but a challenge for me is that my dev environment doesn't align well with the suite of kustomize tests. I'm unable to run the full test suite cleanly. That can make it hard to distinguish test failures that are due to my mistakes from those that are expected due to my environment.)

@shapirus
Copy link

shapirus commented Oct 14, 2023

@ephesused since PR #5076 has finally been merged, I decided to run my tests again. Unfortunately, there has been no release with that PR yet, so I had to build current master branch (2f99707) to test it.

It looks awesome. Not only the slow parsing of unknown kinds is fixed, but it is now 4.7 times faster than kustomize v3.5.4 (the release that I'm still stuck with because of those performance issues) for both known and unknown kinds!

(this is all x86_64, haven't tested it on arm64 yet)

################# STARTING TESTS
# kustomize versions: v3.5.4 v5.1.1 2f99707792228c921e2d477dbad7fc4898691365

## testing known resource kinds ##
kustomize v3.5.4, 1000 iterations
real    0m21.163s

kustomize v5.1.1, 1000 iterations
real    0m4.641s

kustomize 2f99707792228c921e2d477dbad7fc4898691365, 1000 iterations
real    0m4.468s


## testing unknown resource kinds ##
kustomize v3.5.4, 1000 iterations
real    0m21.101s

kustomize v5.1.1, 1000 iterations
real    0m57.450s

kustomize 2f99707792228c921e2d477dbad7fc4898691365, 1000 iterations
real    0m4.455s

As far as my use case is concerned, I consider the issue resolved at this point, thank you very much @ephesused.

p.s. any idea when 5.1.2 may be expected to be released? It's been a while since 5.1.1 came out.

@shapirus
Copy link

5.2.0 has just been released, and I can confirm that it is still as fast as the revision I tested in #4569 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants