custom dashboard discovery - only look for discoverOn metrics #4367

jmazzitelli · 2021-09-17T21:14:54Z

This is trying to work around the problem where api.Series is being called during dashboard discovery, which could return a huge amount of data.

We want to only look for specific metric names - those metrics listed as "discoverOn" metrics in the dashboards. Those are the only ones we care about.

jmazzitelli · 2021-09-20T17:44:04Z

Here is one implementation - it uses a set of "OR" conditions in the prom query. I am documenting this here because I might change this implementation to something different and I want a record of this way of doing it in case I want to revert back at some point in the future (perhaps we might find this is more efficient):

	// you can use "count" here instead of "sum" for possibly even more goodness
	queryString := fmt.Sprintf("sum(%v%v) by (__name__)", metricNames[0], labelQueryString)
	for i := 1; i < len(metricNames); i++ {
		queryString = fmt.Sprintf("%v OR sum(%v%v) by (__name__)", queryString, metricNames[i], labelQueryString)
	}
	results, warnings, err := in.api.Query(in.ctx, queryString, time.Now())
	if warnings != nil && len(warnings) > 0 {
		log.Warningf("GetMetricsForLabels. Prometheus Warnings: [%s]", strings.Join(warnings, ","))
	}
	if err != nil {
		return nil, errors.NewServiceUnavailable(err.Error())
	}

	// We should only get one timeseries for each metric family name. However, just in case
	// we get duplicates, store the metric name in a map and convert to an array to remove duplicates.

	namesMap := make(map[string]bool)
	for _, item := range results.(model.Vector) {
		namesMap[string(item.Metric["__name__"])] = true
	}
	names := make([]string, 0, 5)
	for n := range namesMap {
		names = append(names, n)
	}

	return names, nil

…row per metric family name that exists

… dynamic generation of OR conditions).

…ooking for, and loop over the results) - this will allow us to create a smaller map (most likely the results are going to be much larger than the list of metrics we are looking for)

… we released this out in the wild and we need to know if this is a bottleneck

jmazzitelli · 2021-09-21T21:10:21Z

This is ready enough for people to review.

jshaughn

LGTM with a few non-blocking comments

jshaughn · 2021-09-22T14:09:21Z

prometheus/client.go

+	for i := 0; i < len(metricNames); i++ {
+		metricsWeAreLookingFor[metricNames[i]] = true
+	}


Fine but maybe nicer as:

for _, m := range metricNames { metricsWeAreLookingFor[m] = true

prometheus/client.go

jmazzitelli · 2021-09-22T19:02:24Z

I built and published a test image based on this PR - if someone wants to test it - use this image: quay.io/jmazzitelli/kiali:pr4367

lucasponce · 2021-09-23T07:19:31Z

I see comments on the code, but I don't see screenshots testing the feature.

jmazzitelli · 2021-09-23T13:08:55Z

but I don't see screenshots testing the feature.

There are no screenshots to show because nothing will look different than how things look today. This just performs a different query that I hope makes things faster. But the UI will look identical to how it looks today.

lucasponce · 2021-09-23T13:40:38Z

There are no screenshots to show because nothing will look different than how things look today. This just performs a different query that I hope makes things faster. But the UI will look identical to how it looks today.

Have this work being tested against the https://github.com/kiali/demos/tree/master/runtimes-demo ?

jmazzitelli · 2021-09-23T15:11:59Z

Have this work being tested against the https://github.com/kiali/demos/tree/master/runtimes-demo ?

Built and deployed the server with this PR:

commit 0f39e20 (HEAD -> dashboard-discovery-3704, jmazzitelli/dashboard-discovery-3704)

and installed the runtimes-demo:

Here's all the applications and workloads - you can see all the dashboards are discovered correctly:

jmazzitelli self-assigned this Sep 17, 2021

jmazzitelli added this to In Review in Sprint 63 (v1.41) via automation Sep 17, 2021

jmazzitelli force-pushed the dashboard-discovery-3704 branch from 916d850 to 63d90b3 Compare September 17, 2021 21:21

jmazzitelli force-pushed the dashboard-discovery-3704 branch from 75ea80e to 42ba2dc Compare September 20, 2021 19:47

jmazzitelli added 6 commits September 21, 2021 10:45

probably not needed

3f393aa

this seems to return the correct things... still testing

d28663d

eliminate duplicate metric names

98e7988

use sum(m{l}) by (__name__) to get back a response that has a single …

0d9e4ab

…row per metric family name that exists

change prom query to a small, fixed "count({labels}) by __name__" (no…

8826b11

… dynamic generation of OR conditions).

another way - do the lookup the other way (store the metrics we are l…

e0f3d7a

…ooking for, and loop over the results) - this will allow us to create a smaller map (most likely the results are going to be much larger than the list of metrics we are looking for)

jmazzitelli force-pushed the dashboard-discovery-3704 branch from 42ba2dc to e0f3d7a Compare September 21, 2021 14:45

add trace message that tells us how fast this is. will be useful when…

0f39e20

… we released this out in the wild and we need to know if this is a bottleneck