Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance / custom dashboards: new configs #3668

Merged
merged 1 commit into from Feb 3, 2021

Conversation

jotak
Copy link
Contributor

@jotak jotak commented Feb 3, 2021

  • discovery_enabled (true/false/auto) to switch discovery mode
  • discovery_auto_threshold: pods threshold above which discovery is
    skipped in auto mode

Part of #3660

Operator PR: kiali/kiali-operator#243

- discovery_enabled (true/false/auto) to switch discovery mode
- discovery_auto_threshold: pods threshold above which discovery is
  skipped in auto mode

Part of kiali#3660
@jotak jotak self-assigned this Feb 3, 2021
@jotak jotak added this to In Review in Sprint 52 via automation Feb 3, 2021
@jotak jotak requested a review from hhovsepy February 3, 2021 08:19
@jotak
Copy link
Contributor Author

jotak commented Feb 3, 2021

cc @primeroz
If you want to test, I pushed this image: quay.io/jotak/kiali:dev

Note, I'll address the prometheus API improvement in a separate PR

@jotak
Copy link
Contributor Author

jotak commented Feb 3, 2021

Testers: a simple way to test is to have some runtime metrics demo and scale up pods. Example:

  • Run mesh-arena with metrics
kubectl label namespace default istio-injection=enabled
kubectl apply -f <(curl -L https://raw.githubusercontent.com/jotak/demo-mesh-arena/zizou/quickstart-metrics.yml) -n default
  • Configure new pods threshold in Kiali to something lower, e.g. discovery_auto_threshold: 5 (the default is 10) (this is in external_services.custom_dashboards in CR/CM)
  • Check Kiali, application "ai" => runtime dashboards are visible
  • Scale kubectl scale deployment ai-visitors --replicas=5
  • Check Kiali, application "ai" => runtime dashboards are now hidden

@primeroz
Copy link

primeroz commented Feb 3, 2021

@jotak i ll test this afternoon when i can get the change merged.

So, to understand this correctly, with this change i should get custom dashboards in istio-system namespace for kiali ( since there are only a few pods there ) but not in my busy namespaces. right ?

thanks for the quick turn around

@jotak
Copy link
Contributor Author

jotak commented Feb 3, 2021

@primeroz yes that's the idea, though it's not per namespace decision, but per app/workload

Copy link
Contributor

@hhovsepy hhovsepy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified:

  1. discovery_enabled=true it discovers custom dashboards in any case (discovery_auto_threshold is ignored)
  2. discovery_enabled=false it does not discover custom dashboards (discovery_auto_threshold is ignored)
  3. discovery_enabled=auto custom dashboards are discovered when pods count are less than discovery_auto_threshold value

Copy link
Collaborator

@aljesusg aljesusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM

@primeroz
Copy link

primeroz commented Feb 3, 2021

@jotak Is working for me as well sort of but there are still a couple of issues:

yes that's the idea, though it's not per namespace decision, but per app/workload

I don't know exactly how this bit works but i have a situation like this:

  • App is composed of many workloads , 8, with different number of pods in each
 app-workload1: 20 / 20
 app-workload2: 20 / 20
 app-workload3: 30 / 30
 app-workload4: 20 / 20
 app-workload5: 10 / 10
 app-workload6: 10 / 10
 app-workload7: 1 / 1
 app-workload8: 1 / 1
  • When loading the Application page the change in this PR works and the autodiscovery is disabled
  • When i open the Workload page for any of the workload > 10 pods the PR works and the autodiscovery is disabled and the page loads up super quick
  • When i open the Workload page for any of the workload with <= 10 pods though the autodiscovery is enabled but it must still be doing the series query that returns a huge cardinality causign Kiali to OOM like before
    • something like using the whole app rather than the workload ?
    • In that case than should the autodiscovery consider the whole app when deciding if is above the triggering threshold ?

Also for some reasons, probably unrelated to your change but maybe part of 1.30 branch, in the workload page in the graph overivew panel i now get

No namespace is selected
There is currently no namespace selected, please select one using the Namespace selector.

even though the path is correct and it includes the namespace

@jotak
Copy link
Contributor Author

jotak commented Feb 3, 2021

* When i open the `Workload page` for any of the workload with <= 10 pods though the autodiscovery is **enabled** but it must still be doing the `series` query that returns a huge cardinality **causign Kiali to OOM like before**
  
  * something like using the whole `app` rather than the `workload` ?

It's expected that, when autodiscovery is run, it continues to run that "Series" query.
About the OOM, I think it really loads the metrics for a single workload (not app) but that's still too high and it still OOM. Pods is not the only thing that impacts cardinality, actually it's the whole mesh topology (number of interconnections), plus the variety of protocols used, variety of error codes and so on.

I think the quickest thing to do for you is just to turn off the discovery (you'll still be able to have dashboards from annotations).

But there's also an alternative, because even if with this simple flag turned off you get decent performances, you may still run into troubles later at higher scale. What's recommended when you hit the metrics cardinality problem, is to change your prometheus setup to decrease the retention time on istio's prometheus, have that prometheus strictly limited to scraping istio metrics (no more), and have a second prometheus instance that can scrape more targets and also rewrites istio metrics by discarding a couple of data (essentially pod information). The two prometheus are connected through federation. This is described more in details here: https://istio.io/latest/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring
And how it impacts Kiali is described here: https://medium.com/kialiproject/kiali-with-production-scale-prometheus-c53ddfa20570, long story short, you can point in Kiali config to the istio's prometheus as the main URL, and to the second prometheus as the URL for custom dashboards.

  * In that case than should the autodiscovery consider the whole `app` when deciding if is above the triggering threshold ?

Also for some reasons, probably unrelated to your change but maybe part of 1.30 branch, in the workload page in the graph overivew panel i now get

No namespace is selected
There is currently no namespace selected, please select one using the Namespace selector.

even though the path is correct and it includes the namespace

My bad, I messed up this while building the image, don't worry it's not part of this pull request (it's what I did yesterday to disable the mini-graph when we weren't sure yet what was causing your issue)

@primeroz
Copy link

primeroz commented Feb 3, 2021

I think the quickest thing to do for you is just to turn off the discovery (you'll still be able to have dashboards from annotations).

perfect, that is good enough for me. I will wait for this feature to release and will set just the autodiscovery off while keeping the custom dashboards on

thanks!

@jotak
Copy link
Contributor Author

jotak commented Feb 3, 2021

Thanks for tests and reviews, I'll merge and backport to 1.29

@jotak jotak merged commit 3aee1b1 into kiali:master Feb 3, 2021
Sprint 52 automation moved this from In Review to Done Feb 3, 2021
@ghost ghost added this to the v1.30.0 milestone Feb 3, 2021
jotak added a commit to jotak/swscore that referenced this pull request Feb 3, 2021
- discovery_enabled (true/false/auto) to switch discovery mode
- discovery_auto_threshold: pods threshold above which discovery is
  skipped in auto mode

Part of kiali#3660
jotak added a commit that referenced this pull request Feb 4, 2021
- discovery_enabled (true/false/auto) to switch discovery mode
- discovery_auto_threshold: pods threshold above which discovery is
  skipped in auto mode

Part of #3660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Sprint 52
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants