Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of openapi schema parsing in kyaml/openapi #2651

Closed
mengqiy opened this issue Jan 19, 2022 · 4 comments
Closed

Improve performance of openapi schema parsing in kyaml/openapi #2651

mengqiy opened this issue Jan 19, 2022 · 4 comments
Assignees
Labels
area/hydrate enhancement New feature or request p1

Comments

@mengqiy
Copy link
Contributor

mengqiy commented Jan 19, 2022

kpt CLI and some functions requires openapi to work. But it's very slow when parsing the openapi api in json format (kubernetes-sigs/kustomize#3670). There are a few options (kubernetes-sigs/kustomize#4396) we can make it faster.

@mengqiy mengqiy added this to ToDo in kpt kanban board Jan 19, 2022
@mengqiy mengqiy added the enhancement New feature or request label Jan 19, 2022
@natasha41575 natasha41575 moved this from ToDo to In progress in kpt kanban board Jan 28, 2022
@natasha41575
Copy link
Contributor

natasha41575 commented Feb 12, 2022

Because it seems that we can get a significant performance improvement from this work, I believe that we should prioritize this, but we should be aware that of the scope.

The performance improvement mainly revolves around switching the library we use from "k8s.io/kube-openapi/pkg/validation/spec" to the "github.com/googleapis/gnostic/openapiv2", and for this there be a lot of changes. The underlying schema representation of the new library is different than the current library in subtle but important ways, so everywhere that the old library is currently being used will need to be rewritten. Some of these changes may be small where there are similarities in structure, while other changes will be larger. After quickly looking over where the old library is used, it seems to me that most of these tasks are doable, but that there are (1) a lot of them, (2) insufficient unit tests where they are used, and (3) most of the changes are dependent on each other and would be challenging to tackle independently. Another challenge is that the new library is not very well documented and there are few examples. This may make it difficult to develop the changes incrementally and to understand the current state of progress at any point.

I think one thing we can do while developing is start by creating a separate openapi library in kyaml using the new library that supports some of the current features and build upon it. Then we can remove the old library when this is done. kubernetes-sigs/kustomize#4474 attempts to start this work.

Additionally, there are some areas where the openapi schema is used that are not covered by the tests in the kustomize repository, but will break behavior in kpt. I can remove large portions of code and still successfully run all the tests in the kustomize repo, but some of that code is essential for kpt. Due to the insufficient coverage, part of this work should be to ensure that there are unit tests that provide enough coverage that we can be reasonably confident that any outside kyaml consumers, kpt included, will not see breaking changes.

It seems that there is a lot of places where the usage of the kyaml/openapi library in setters has some parts of it that are difficult to rewrite with the new library; so perhaps for setters in the beginning we can use the old library for now, while we use the new library for everything else.

I am going to list out most of the places that I can see we are using the old library, so that we have a better understanding of what is required. This list does not cover what would be required for the next step, which is to use proto instead of JSON. If we switch over to the new library for our JSON/proto parsing (the main performance improvement that we want), I believe that most, if not all, of these places will need to be rewritten to use the new library.

Namespaceability

Kyaml uses OpenAPI to determine the namespaceability of resources. Moving this over to the new library was fairly straightforward, and has been implemented in this PR.

FieldMeta

Kyaml defines this type FieldMeta here.

This type is used to support the kpt use cases of 3-way merge in the following locations: map.go walk.go. For the usage in map.go, it is being directly passed into ResourceSchema so this work and the ResourceSchema work cannot be isolated.

It is also used to support kpt setters, in the following files: setters2/add.go setters2/delete.go

ResourceSchema

There is a type, ResourceSchema that is used throughout kustomize and embeds the type spec.Schema. This type is used in many, many places in kyaml, including in the merge2 and merge3 code, some filters, setters code. A quick search via Goland shows:

  • 5 methods that return type ResourceSchema.
  • 10 methods that initialize a struct of type ResourceSchema
  • 6 other unclassified usages.
  • 6 methods on the type ResourceSchema that need to be rewritten, including one called Elements. Elements is a notable method because it takes in a ResourceSchema, and returns a ResourceSchema that represents a list of elements. AFAICT, the new library represents lists as a different type (an array of some object) than other objects. (Though maybe I'm wrong, there are not many examples to make it clear.) This conversion may require some additional work besides just rewriting Elements. Maybe we will have to split up Elements into multiple methods or define a new wrapper type or some alternative, each of these cases will ripple into every place where Elements is used (which is not trivial).

The latter challenge with Elements is probably not unique to Elements; my guess is we will run into similar challenges when trying to rewrite other parts of the code, due to the fact that the underlying schema representation is subtly different between the two libraries. I think these will be the most difficult to tackle in a clean way; it may be better to rewrite larger chunks of the code structure with the new library in mind.

These various methods that involve ResourceSchema are sprinkled all throughout kyaml. Changing ResourceSchema and rewriting all the code where it is used will be most likely be the biggest part of moving to the new library.

openapiData

There is a global variable in openapi.go called openapiData that is used to store the parsed openapi state. There are two fields that we should be considered with:

schema

This is an embedding of the type spec.Schema from the current openapi library and contains the entire parsed openapi schema. This is surfaced in kyaml through the function openapi.Schema().

This is used in a few places, including the starlark runtime. I will have to take some time to understand what the starlark runtime is doing, right now I am not sure what starlark is doing with it. My hope is that this will be a very simple change.

There are three places it is used is to resolve references, e.g. walk.go.

schemaByResourceType

This maps resource typemeta to their schemas. This is surfaced by the method SchemaForResourceType which is used in a few places throughout kyaml, including setters, the function framework, and the underlying implementation of the merge code. If we are able to adapt ResourceSchema to use the new library, this task will happen naturally as it looks like the two are tightly intertwined.

Resolving references

The new library has some functions it provides to resolve references, but it seems that these are used mainly to resolve references within a single file. The way it is used is setters is very different - we pass in a reference string and have the current library resolve it against the current schema. Digging into the new library and I can't find a simple equivalent, we may have to be a bit creative in tackling this, e.g. instead of trying to use the library to resolve the reference, find the field specified by the openapi ref with our own code logic.

---

This list may not be exhaustive; this is just from my first look at the state of it. It is possible that there will be more we need to do even after tackling all of these tasks.

@droot
Copy link
Contributor

droot commented Feb 15, 2022

Thanks for the detailed analysis @natasha41575 . I was under the impression that we just need to switch the serialization format in which we consume the schema,. Switching to new library has ripple effect across both the projects and changes the scope significantly.

Before brainstorming alternatives, I have a quick question:
why k8s.io/kube-openapi/pkg/validation/spec can not use proto format for openapi schema (keeping the data model same) ? [If it doesn't provide pluggable parser support, we can fork or upstream the change ]

/cc @mengqiy

@natasha41575
Copy link
Contributor

natasha41575 commented Feb 15, 2022

@droot Not sure if k8s.io/kube-openapi/pkg/validation/spec can use proto but I have not looked very deeply. Looking at this file here https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/spec/swagger.go and the library seems to revolve around JSON schemas. Maybe if SwaggerProps can read from a proto file, or if we can somehow leverage https://pkg.go.dev/k8s.io/kube-openapi/pkg/util/proto, this might be possible.

FWIW, I think changing to the new library is possible if there are no alternatives; the significant performance improvement is probably worth it. But it will be a lot of work.

@natasha41575
Copy link
Contributor

This work is done.

kpt kanban board automation moved this from In progress to Done May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hydrate enhancement New feature or request p1
Projects
Development

No branches or pull requests

3 participants