New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of openapi schema parsing in kyaml/openapi #2651
Comments
Because it seems that we can get a significant performance improvement from this work, I believe that we should prioritize this, but we should be aware that of the scope. The performance improvement mainly revolves around switching the library we use from "k8s.io/kube-openapi/pkg/validation/spec" to the "github.com/googleapis/gnostic/openapiv2", and for this there be a lot of changes. The underlying schema representation of the new library is different than the current library in subtle but important ways, so everywhere that the old library is currently being used will need to be rewritten. Some of these changes may be small where there are similarities in structure, while other changes will be larger. After quickly looking over where the old library is used, it seems to me that most of these tasks are doable, but that there are (1) a lot of them, (2) insufficient unit tests where they are used, and (3) most of the changes are dependent on each other and would be challenging to tackle independently. Another challenge is that the new library is not very well documented and there are few examples. This may make it difficult to develop the changes incrementally and to understand the current state of progress at any point. I think one thing we can do while developing is start by creating a separate openapi library in kyaml using the new library that supports some of the current features and build upon it. Then we can remove the old library when this is done. kubernetes-sigs/kustomize#4474 attempts to start this work. Additionally, there are some areas where the openapi schema is used that are not covered by the tests in the kustomize repository, but will break behavior in kpt. I can remove large portions of code and still successfully run all the tests in the kustomize repo, but some of that code is essential for kpt. Due to the insufficient coverage, part of this work should be to ensure that there are unit tests that provide enough coverage that we can be reasonably confident that any outside kyaml consumers, kpt included, will not see breaking changes. It seems that there is a lot of places where the usage of the kyaml/openapi library in setters has some parts of it that are difficult to rewrite with the new library; so perhaps for setters in the beginning we can use the old library for now, while we use the new library for everything else. I am going to list out most of the places that I can see we are using the old library, so that we have a better understanding of what is required. This list does not cover what would be required for the next step, which is to use proto instead of JSON. If we switch over to the new library for our JSON/proto parsing (the main performance improvement that we want), I believe that most, if not all, of these places will need to be rewritten to use the new library. NamespaceabilityKyaml uses OpenAPI to determine the namespaceability of resources. Moving this over to the new library was fairly straightforward, and has been implemented in this PR. FieldMetaKyaml defines this type This type is used to support the kpt use cases of 3-way merge in the following locations: map.go walk.go. For the usage in map.go, it is being directly passed into It is also used to support kpt setters, in the following files: setters2/add.go setters2/delete.go ResourceSchemaThere is a type,
The latter challenge with These various methods that involve openapiDataThere is a global variable in openapi.go called openapiData that is used to store the parsed openapi state. There are two fields that we should be considered with:
|
Thanks for the detailed analysis @natasha41575 . I was under the impression that we just need to switch the serialization format in which we consume the schema,. Switching to new library has ripple effect across both the projects and changes the scope significantly. Before brainstorming alternatives, I have a quick question: /cc @mengqiy |
@droot Not sure if FWIW, I think changing to the new library is possible if there are no alternatives; the significant performance improvement is probably worth it. But it will be a lot of work. |
This work is done. |
kpt CLI and some functions requires openapi to work. But it's very slow when parsing the openapi api in json format (kubernetes-sigs/kustomize#3670). There are a few options (kubernetes-sigs/kustomize#4396) we can make it faster.
The text was updated successfully, but these errors were encountered: