Performance / custom dashboards: new configs #3668

jotak · 2021-02-03T08:19:13Z

discovery_enabled (true/false/auto) to switch discovery mode
discovery_auto_threshold: pods threshold above which discovery is
skipped in auto mode

Part of #3660

- discovery_enabled (true/false/auto) to switch discovery mode - discovery_auto_threshold: pods threshold above which discovery is skipped in auto mode Part of kiali#3660

jotak · 2021-02-03T08:20:46Z

cc @primeroz
If you want to test, I pushed this image: quay.io/jotak/kiali:dev

Note, I'll address the prometheus API improvement in a separate PR

jotak · 2021-02-03T08:26:12Z

Testers: a simple way to test is to have some runtime metrics demo and scale up pods. Example:

Run mesh-arena with metrics

kubectl label namespace default istio-injection=enabled
kubectl apply -f <(curl -L https://raw.githubusercontent.com/jotak/demo-mesh-arena/zizou/quickstart-metrics.yml) -n default

Configure new pods threshold in Kiali to something lower, e.g. discovery_auto_threshold: 5 (the default is 10) (this is in external_services.custom_dashboards in CR/CM)
Check Kiali, application "ai" => runtime dashboards are visible
Scale kubectl scale deployment ai-visitors --replicas=5
Check Kiali, application "ai" => runtime dashboards are now hidden

primeroz · 2021-02-03T08:53:08Z

@jotak i ll test this afternoon when i can get the change merged.

So, to understand this correctly, with this change i should get custom dashboards in istio-system namespace for kiali ( since there are only a few pods there ) but not in my busy namespaces. right ?

thanks for the quick turn around

jotak · 2021-02-03T08:56:53Z

@primeroz yes that's the idea, though it's not per namespace decision, but per app/workload

hhovsepy

Verified:

discovery_enabled=true it discovers custom dashboards in any case (discovery_auto_threshold is ignored)
discovery_enabled=false it does not discover custom dashboards (discovery_auto_threshold is ignored)
discovery_enabled=auto custom dashboards are discovered when pods count are less than discovery_auto_threshold value

aljesusg

LGFM

primeroz · 2021-02-03T12:06:13Z

@jotak Is working for me as well sort of but there are still a couple of issues:

yes that's the idea, though it's not per namespace decision, but per app/workload

I don't know exactly how this bit works but i have a situation like this:

App is composed of many workloads , 8, with different number of pods in each

 app-workload1: 20 / 20
 app-workload2: 20 / 20
 app-workload3: 30 / 30
 app-workload4: 20 / 20
 app-workload5: 10 / 10
 app-workload6: 10 / 10
 app-workload7: 1 / 1
 app-workload8: 1 / 1

When loading the Application page the change in this PR works and the autodiscovery is disabled
When i open the Workload page for any of the workload > 10 pods the PR works and the autodiscovery is disabled and the page loads up super quick
When i open the Workload page for any of the workload with <= 10 pods though the autodiscovery is enabled but it must still be doing the series query that returns a huge cardinality causign Kiali to OOM like before
- something like using the whole app rather than the workload ?
- In that case than should the autodiscovery consider the whole app when deciding if is above the triggering threshold ?

Also for some reasons, probably unrelated to your change but maybe part of 1.30 branch, in the workload page in the graph overivew panel i now get

No namespace is selected
There is currently no namespace selected, please select one using the Namespace selector.

even though the path is correct and it includes the namespace

jotak · 2021-02-03T14:56:31Z

* When i open the `Workload page` for any of the workload with <= 10 pods though the autodiscovery is **enabled** but it must still be doing the `series` query that returns a huge cardinality **causign Kiali to OOM like before**
  
  * something like using the whole `app` rather than the `workload` ?

It's expected that, when autodiscovery is run, it continues to run that "Series" query.
About the OOM, I think it really loads the metrics for a single workload (not app) but that's still too high and it still OOM. Pods is not the only thing that impacts cardinality, actually it's the whole mesh topology (number of interconnections), plus the variety of protocols used, variety of error codes and so on.

I think the quickest thing to do for you is just to turn off the discovery (you'll still be able to have dashboards from annotations).

But there's also an alternative, because even if with this simple flag turned off you get decent performances, you may still run into troubles later at higher scale. What's recommended when you hit the metrics cardinality problem, is to change your prometheus setup to decrease the retention time on istio's prometheus, have that prometheus strictly limited to scraping istio metrics (no more), and have a second prometheus instance that can scrape more targets and also rewrites istio metrics by discarding a couple of data (essentially pod information). The two prometheus are connected through federation. This is described more in details here: https://istio.io/latest/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring
And how it impacts Kiali is described here: https://medium.com/kialiproject/kiali-with-production-scale-prometheus-c53ddfa20570, long story short, you can point in Kiali config to the istio's prometheus as the main URL, and to the second prometheus as the URL for custom dashboards.

  * In that case than should the autodiscovery consider the whole `app` when deciding if is above the triggering threshold ?
Also for some reasons, probably unrelated to your change but maybe part of 1.30 branch, in the workload page in the graph overivew panel i now get
No namespace is selected
There is currently no namespace selected, please select one using the Namespace selector.
even though the path is correct and it includes the namespace

My bad, I messed up this while building the image, don't worry it's not part of this pull request (it's what I did yesterday to disable the mini-graph when we weren't sure yet what was causing your issue)

primeroz · 2021-02-03T14:58:57Z

I think the quickest thing to do for you is just to turn off the discovery (you'll still be able to have dashboards from annotations).

perfect, that is good enough for me. I will wait for this feature to release and will set just the autodiscovery off while keeping the custom dashboards on

thanks!

jotak · 2021-02-03T15:07:53Z

Thanks for tests and reviews, I'll merge and backport to 1.29

- discovery_enabled (true/false/auto) to switch discovery mode - discovery_auto_threshold: pods threshold above which discovery is skipped in auto mode Part of kiali#3660

- discovery_enabled (true/false/auto) to switch discovery mode - discovery_auto_threshold: pods threshold above which discovery is skipped in auto mode Part of #3660

Performance / custom dashboards: new configs

c223b71

- discovery_enabled (true/false/auto) to switch discovery mode - discovery_auto_threshold: pods threshold above which discovery is skipped in auto mode Part of kiali#3660

jotak requested review from lucasponce and aljesusg February 3, 2021 08:19

jotak self-assigned this Feb 3, 2021

jotak added this to In Review in Sprint 52 via automation Feb 3, 2021

jotak requested a review from hhovsepy February 3, 2021 08:19

lucasponce added the backport/v1.29 label Feb 3, 2021

hhovsepy approved these changes Feb 3, 2021

View reviewed changes

aljesusg approved these changes Feb 3, 2021

View reviewed changes

jotak merged commit 3aee1b1 into kiali:master Feb 3, 2021

Sprint 52 automation moved this from In Review to Done Feb 3, 2021

jotak mentioned this pull request Feb 3, 2021

Performance / custom dashboards: new configs kiali/kiali-operator#243

Merged

ghost added this to the v1.30.0 milestone Feb 3, 2021

jotak mentioned this pull request Feb 3, 2021

[v1.29] Performance / custom dashboards: new configs (#3668) #3670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance / custom dashboards: new configs #3668

Performance / custom dashboards: new configs #3668

jotak commented Feb 3, 2021

jotak commented Feb 3, 2021 •

edited

jotak commented Feb 3, 2021 •

edited

primeroz commented Feb 3, 2021

jotak commented Feb 3, 2021

hhovsepy left a comment

aljesusg left a comment

primeroz commented Feb 3, 2021 •

edited

jotak commented Feb 3, 2021

primeroz commented Feb 3, 2021

jotak commented Feb 3, 2021

Performance / custom dashboards: new configs #3668

Performance / custom dashboards: new configs #3668

Conversation

jotak commented Feb 3, 2021

jotak commented Feb 3, 2021 • edited

jotak commented Feb 3, 2021 • edited

primeroz commented Feb 3, 2021

jotak commented Feb 3, 2021

hhovsepy left a comment

Choose a reason for hiding this comment

aljesusg left a comment

Choose a reason for hiding this comment

primeroz commented Feb 3, 2021 • edited

jotak commented Feb 3, 2021

primeroz commented Feb 3, 2021

jotak commented Feb 3, 2021

jotak commented Feb 3, 2021 •

edited

jotak commented Feb 3, 2021 •

edited

primeroz commented Feb 3, 2021 •

edited