Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc for offline processing #7142

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nrfox
Copy link
Contributor

@nrfox nrfox commented Feb 20, 2024

Describe the change

Adds a design doc for "offline processing" of kiali data.

Relates to #7076

POC PR: #7136

Copy link
Collaborator

@jshaughn jshaughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrfox Looks good, no fundamental issues for me, just a bunch of suggested edits for clarity/grammar.


# Motivation

Kiali currently has acceptible performance below a certain threshold of scale. That "scale" could be number of pods, services, Istio resources, namespaces, etc. but in general Kiali reaches a certain threshold of one or more of these factors and it becomes noticably slow. Typically this manifests itself in very long page load times (30s+) and very slow (30s+) API responses. Kiali should remain performant even at larger scale.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kiali currently has acceptible performance below a certain threshold of scale. That "scale" could be number of pods, services, Istio resources, namespaces, etc. but in general Kiali reaches a certain threshold of one or more of these factors and it becomes noticably slow. Typically this manifests itself in very long page load times (30s+) and very slow (30s+) API responses. Kiali should remain performant even at larger scale.
Kiali currently has acceptible performance below a certain threshold of scale. That "scale" could be number of pods, services, Istio resources, namespaces, etc. but in general Kiali reaches a certain threshold of one or more of these factors and it becomes noticeably slow. Typically this manifests itself in very long page load times (30s+) and very slow (30s+) API responses. Kiali should remain performant even at larger scale.


Kiali currently has acceptible performance below a certain threshold of scale. That "scale" could be number of pods, services, Istio resources, namespaces, etc. but in general Kiali reaches a certain threshold of one or more of these factors and it becomes noticably slow. Typically this manifests itself in very long page load times (30s+) and very slow (30s+) API responses. Kiali should remain performant even at larger scale.

Notably Kiali does most processing within the lifecyle of a request meaning before the api responds to a request, it fetches data from external dependencies (prometheus, jaeger), performs some processing (graph generation, validations), and then transforms the results into an api response. If any one of those tasks takes a long time or fails entirely, the response can be extremely slow or fail altogether.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Notably Kiali does most processing within the lifecyle of a request meaning before the api responds to a request, it fetches data from external dependencies (prometheus, jaeger), performs some processing (graph generation, validations), and then transforms the results into an api response. If any one of those tasks takes a long time or fails entirely, the response can be extremely slow or fail altogether.
Notably, Kiali does most processing within the lifecyle of a request. This means that before the API responds to a request, it fetches data from external dependencies (e.g. prometheus, jaeger), performs some processing (graph generation, validations), and then transforms the results into an API response. If any one of those tasks performs poorly, or fails entirely, the response can be extremely slow or fail altogether.


# Solution

A "Kiali model" has been [previously discussed](https://github.com/kiali/kiali/discussions/4080) which is an in memory cache of data that Kiali computes like Validations, Health, TLS. Kiali would compute this data outside of a request and then cache it for some period of time. This KEP expands on that idea by providing a specific framework for how to compute and cache that data. This framework is henceforth called the "controller model" and it follows the same pattern that most kube controllers use.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A "Kiali model" has been [previously discussed](https://github.com/kiali/kiali/discussions/4080) which is an in memory cache of data that Kiali computes like Validations, Health, TLS. Kiali would compute this data outside of a request and then cache it for some period of time. This KEP expands on that idea by providing a specific framework for how to compute and cache that data. This framework is henceforth called the "controller model" and it follows the same pattern that most kube controllers use.
A "Kiali model" has been [previously discussed](https://github.com/kiali/kiali/discussions/4080), and is an in-memory, pre-computed cache of data for things like Validations, Health, and TLS. Kiali would compute this data outside of a request and then cache it for some period of time. This KEP expands on that idea by providing a specific framework for how to compute and cache that data. This framework is henceforth called the "controller model" and it follows the same pattern as most Kube controllers.


A "Kiali model" has been [previously discussed](https://github.com/kiali/kiali/discussions/4080) which is an in memory cache of data that Kiali computes like Validations, Health, TLS. Kiali would compute this data outside of a request and then cache it for some period of time. This KEP expands on that idea by providing a specific framework for how to compute and cache that data. This framework is henceforth called the "controller model" and it follows the same pattern that most kube controllers use.

A typical Kubernetes controller continually watches some objects for changes and when an object does change, it will read the current state, do some work to get things to the desired state, then update the status of the object. For example, a deployment controllor might watch for deployments to be created and when one is: the deployment controller reads the deployment spec --> creates pods according to the spec --> updates the deployment status with the pods it created. When someone reads that deployment from the kube API server, the API server does not compute the deployment status, it simply serves up what is saved in etcd.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A typical Kubernetes controller continually watches some objects for changes and when an object does change, it will read the current state, do some work to get things to the desired state, then update the status of the object. For example, a deployment controllor might watch for deployments to be created and when one is: the deployment controller reads the deployment spec --> creates pods according to the spec --> updates the deployment status with the pods it created. When someone reads that deployment from the kube API server, the API server does not compute the deployment status, it simply serves up what is saved in etcd.
A typical Kubernetes controller continually watches some objects for changes, and when an object does change, it reads the current state, does some work to get things to the desired state, then updates the status of the object. For example, a deployment controller might watch for deployments to be created and on creation the deployment controller: reads the deployment spec --> creates pods according to the spec --> updates the deployment status with the pods it created. When someone reads that deployment from the Kube API server, the API server does not compute the deployment status, it simply serves up what is saved in etcd.


A typical Kubernetes controller continually watches some objects for changes and when an object does change, it will read the current state, do some work to get things to the desired state, then update the status of the object. For example, a deployment controllor might watch for deployments to be created and when one is: the deployment controller reads the deployment spec --> creates pods according to the spec --> updates the deployment status with the pods it created. When someone reads that deployment from the kube API server, the API server does not compute the deployment status, it simply serves up what is saved in etcd.

This proposes that Kiali follows a similar pattern and have different controllers to compute and cache in memory the data that the Kiali API returns. These controllers will run in the same binary as Kiali and there won't be any additional deployment requirements. Each controller can read/watch from one or more sources such as `VirtualService` objects from the kube API or data outside of kube like proxy status for workload `Health`. After gathering the inputs, the controllers would compute something like Validations and then update the Kiali Cache.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proposes that Kiali follows a similar pattern and have different controllers to compute and cache in memory the data that the Kiali API returns. These controllers will run in the same binary as Kiali and there won't be any additional deployment requirements. Each controller can read/watch from one or more sources such as `VirtualService` objects from the kube API or data outside of kube like proxy status for workload `Health`. After gathering the inputs, the controllers would compute something like Validations and then update the Kiali Cache.
This proposes that Kiali follow a similar pattern and have different controllers to compute and cache in memory the data that the Kiali API returns. These controllers will run in the same binary as Kiali and there won't be any additional deployment requirements. Each controller can read/watch from one or more sources, such as `VirtualService` objects from the kube API, or data outside of Kube like proxy status for workload `Health`. After gathering the inputs, the controllers would compute something like Validations and then update the Kiali Cache.


![Validations Controller](Validations_Controller.png "Validations Controller")

The controller watches each source and when one changes, it validates the object, and then updates the Kiali Cache with the validation. When the frontend asks for validations, the API reads what is in the Kiali Cache. Because validations are served directly from memory rather than computed on the fly, the API response times are very fast and remain that way even as the number of objects grows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is fine, although since validations can involve multiple objects (configs) it may make sense to perform all validations on any change?


The controller watches each source and when one changes, it validates the object, and then updates the Kiali Cache with the validation. When the frontend asks for validations, the API reads what is in the Kiali Cache. Because validations are served directly from memory rather than computed on the fly, the API response times are very fast and remain that way even as the number of objects grows.

An advantage to the controller model is being able to re-use Kubernetes libraries and patterns for building controllers to handle setting up watches, parallel processing, retries on failures etc. Most of the sources will come from Kubernetes. Non-Kubernetes sources can be implemented with Polling if they do not have some kind of "watch" mechanism. Kubernetes sources will be updated almost instantateously making this a near "real time" solution. Non-Kubernetes sources will be limited by how often they poll the source but probably not more than 15-30s. This amount of lag is acceptabile for Kiali's use cases and is a resaonable tradeoff for better performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An advantage to the controller model is being able to re-use Kubernetes libraries and patterns for building controllers to handle setting up watches, parallel processing, retries on failures etc. Most of the sources will come from Kubernetes. Non-Kubernetes sources can be implemented with Polling if they do not have some kind of "watch" mechanism. Kubernetes sources will be updated almost instantateously making this a near "real time" solution. Non-Kubernetes sources will be limited by how often they poll the source but probably not more than 15-30s. This amount of lag is acceptabile for Kiali's use cases and is a resaonable tradeoff for better performance.
An advantage to building controllers using the controller model, is being able to re-use Kubernetes libraries and patterns to handle setting up watches, parallel process, retry on failures, etc. Most of the sources will come from Kubernetes. Non-Kubernetes sources can be implemented with Polling if they do not have some kind of "watch" mechanism. Kubernetes sources will be updated almost instantaneously, making this a "near real time" solution. Non-Kubernetes sources will be limited by how often they poll the source but probably not more than 15-30s. This amount of lag is acceptable for Kiali's use cases and is a reasonable trade-off for better performance.


An advantage to the controller model is being able to re-use Kubernetes libraries and patterns for building controllers to handle setting up watches, parallel processing, retries on failures etc. Most of the sources will come from Kubernetes. Non-Kubernetes sources can be implemented with Polling if they do not have some kind of "watch" mechanism. Kubernetes sources will be updated almost instantateously making this a near "real time" solution. Non-Kubernetes sources will be limited by how often they poll the source but probably not more than 15-30s. This amount of lag is acceptabile for Kiali's use cases and is a resaonable tradeoff for better performance.

There's a few downsides to this approach.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There's a few downsides to this approach.
There are a few downsides to this approach:


There's a few downsides to this approach.

1. Caching more objects in memory will require greater memory usage. The Kiali cache is an in-memory cache and storing more objects in memory will lead to an increase in memory consumption. This can be mitigated somewhat by only storing the results of computations in the Kiali cache, for example storing the trafficmap rather than all of the individual metrics that were used to generate it. There's also some optimizations to be made by reducing the amount of memory consumed by the kubernetes cache that Kiali uses: https://github.com/kiali/kiali/issues/7017. This could offset increased memory consumption by the Kiali cache. Ultimately though there's no free lunch and storing more objects in memory will require more memory. Kiali will need to keep the size of this cache reasonably small.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Caching more objects in memory will require greater memory usage. The Kiali cache is an in-memory cache and storing more objects in memory will lead to an increase in memory consumption. This can be mitigated somewhat by only storing the results of computations in the Kiali cache, for example storing the trafficmap rather than all of the individual metrics that were used to generate it. There's also some optimizations to be made by reducing the amount of memory consumed by the kubernetes cache that Kiali uses: https://github.com/kiali/kiali/issues/7017. This could offset increased memory consumption by the Kiali cache. Ultimately though there's no free lunch and storing more objects in memory will require more memory. Kiali will need to keep the size of this cache reasonably small.
1. Caching more objects in memory will require greater memory usage. The Kiali cache is an in-memory cache and storing more objects in memory will lead to an increase in memory consumption. This can be mitigated somewhat by only storing the results of computations in the Kiali cache, for example storing a graph traffic-map rather than all of the individual metrics used in its generation. There are also some optimizations to be made by reducing the amount of memory consumed by the Kubernetes cache that Kiali uses: https://github.com/kiali/kiali/issues/7017. This could offset increased memory consumption by the Kiali cache. Ultimately though there's no free lunch and storing more objects in memory will require more memory. Kiali will need to keep the size of this cache reasonably small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants