Old metrics still visible #48

komljen · 2022-01-18T18:48:37Z

I'm using the starboard feature described here https://github.com/giantswarm/starboard-exporter#one-vulnerabilityreport-per-deployment, and even though I don't see old reports anymore with kubectl CLI:

kubectl get vulnerabilityreport -n gradle-enterprise
NAME                                                      REPOSITORY                                               TAG        SCANNER   AGE
replicaset-5c8b5d8449                                     gradleenterprise/gradle-enterprise-operator-image        2021.4.1   Trivy     82m
replicaset-5cf45f8fd7                                     gradleenterprise/gradle-build-cache-node-image           2021.4.1   Trivy     82m
replicaset-764c4bd49c                                     gradleenterprise/gradle-test-distribution-broker-image   2021.4.1   Trivy     82m
replicaset-gradle-database-5b89d7b595-database            gradleenterprise/gradle-database-image                   2021.4.1   Trivy     82m
replicaset-gradle-database-5b89d7b595-database-tasks      gradleenterprise/gradle-database-image                   2021.4.1   Trivy     82m
replicaset-gradle-metrics-64c7565799-gradle-metrics       gradleenterprise/gradle-metrics-image                    2021.4.1   Trivy     82m
statefulset-gradle-enterprise-app-gradle-enterprise-app   gradleenterprise/gradle-enterprise-app-image             2021.4.1   Trivy     148m
statefulset-gradle-keycloak-gradle-keycloak               gradleenterprise/gradle-keycloak-image                   2021.4.1   Trivy     144m
statefulset-gradle-proxy-gradle-proxy                     gradleenterprise/gradle-proxy-image                      2021.4.1   Trivy     150m

If I go to the metrics endpoint on starboard exporter, I still see metrics like (notice the image tag version):

starboard_exporter_vulnerabilityreport_image_vulnerability{image_namespace="gradle-enterprise",image_repository="gradleenterprise/gradle-keycloak-image",image_tag="2021.4",report_name="statefulset-gradle-keycloak-gradle-keycloak",vulnerability_id="CVE-2021-30129"} 6.5

I guess this is because the report name is not unique in this case, like with replica sets?

The text was updated successfully, but these errors were encountered:

stone-z · 2022-01-19T11:24:04Z

Hey @komljen, can you describe a bit more how it got to the current state? Specifically:

starboard v0.14.0 isn't released yet, are you using an rc version?
do you see both old and new metrics or only old?
can you provide the output of kubectl describe vulnerabilityreport -n gradle-enterprise statefulset-gradle-keycloak-gradle-keycloak?
how did you update the statefulset / are you able to provide the vulnerabilityreport of the previous version (the one being reported by the metric)?

My theory is that if an existing VulnerabilityReport is updated, the exporter isn't clearing the old metrics like it would on a deletion

komljen · 2022-01-19T11:35:15Z

Yeah, I'm using the RC version and both old and new metrics were available. The stateful set is updated via Helm upgrade, and the previous vulnerability report just got overwritten. When describing it I see new values in there only.

I have the same theory. I had to restart the starboard exporter pod to get rid of old metrics. Not sure if there is anything else that we could do in this scenario.

stone-z · 2022-01-19T12:21:11Z

We likely have to include some diff logic and clear metrics for the previous version if the CR is just updated. Roughly in here.

Thanks for bringing it to my attention, I'll take a deeper look and come up with something

stone-z · 2022-02-14T14:33:42Z

So just to keep this issue up to date, I've looked a bit more into this and don't see a great way forward yet.

The problem is that once the report object is updated, we can't reconstruct the vector needed to clear the prometheus metric. I've added a note to this prometheus client issue which would make it possible to clear the old metrics.

Aside from official prometheus support, current options seem to be limited to either storing a copy of all those label vectors per-report (which I'd rather not do), clearing ALL metrics (so there would be weird periodic drops in metrics and dashboards, potentially re-triggering alerts, etc.), or just restarting the exporter (which is inelegant).

TL;DR - I want to fix this but don't see how quite yet. A workaround is to periodically restart the exporter. Open to suggestions

akosveres · 2022-02-28T08:56:12Z

I believe the solution is to drop the labels for a specific metric, at least that's what I've done on a previous project which is similar to starboard. My metric looked like this:

	vulnMetrics = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Name: "scan_vulnerabilities",
			Help: "Number of vulnerabilities found, reporting container image and vulnerability severity",
		},
		[]string{
			// Container image scanned
			"container",
			// Severity of the vulnerability
			"severity",
		},
	)

Then I was able to do :

func removeObsoleteContainers(list []string) (err error) {
	severities := []string{"CRITICAL", "HIGH", "LOW", "MEDIUM", "UNKNOWN"}
	for _, image := range list {
		for _, sev := range severities {
			ok := vulnMetrics.Delete(prometheus.Labels{"container": image, "severity": sev})
			if ok != true {
				err = fmt.Errorf("Couldn't delete metrics with image label %s, severity %s", image, sev)
			}
		}

	}
	return
}

If a label can not be removed for a metric, then an error is thrown, which should be fine. Of course, there may be better ways of doing this. I'm only adding this information as it may be interesting.

Based on

The problem is that once the report object is updated, we can't reconstruct the vector needed to clear the prometheus metric.

this may not be possible.

stone-z · 2022-03-28T07:39:42Z

Deletion based on partial matches is not currently supported by the Go prometheus client, so I've opened prometheus/client_golang#1013 to add this capability

stone-z · 2022-04-21T07:55:02Z

Update: The upstream fix has been merged, so just waiting on a new release and we can resolve this

philippeckel · 2022-06-15T06:14:44Z

@stone-z thanks a lot! So this is fixed now but just waiting for a new release?

stone-z · 2022-06-15T12:57:58Z

@philippeckel correct, I've merged a temporary fix to starboard-exporter while we wait for a prometheus client release. My colleague is working on support for CIS benchmarks (kube-bench) and then we will do a new exporter release which should resolve this issue.

philippeckel · 2022-06-15T13:19:01Z

@stone-z awesome, thanks a lot!

stone-z · 2022-06-23T11:51:02Z

The fix for this has been released with v0.5.0

erikgb mentioned this issue Jun 4, 2022

expose findings as prometheus metrics aquasecurity/trivy-operator#78

Closed

stone-z mentioned this issue Jun 8, 2022

Remove finalizers #115

Merged

3 tasks

stone-z closed this as completed Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Old metrics still visible #48

Old metrics still visible #48

komljen commented Jan 18, 2022

stone-z commented Jan 19, 2022

komljen commented Jan 19, 2022

stone-z commented Jan 19, 2022

stone-z commented Feb 14, 2022

akosveres commented Feb 28, 2022 •

edited

stone-z commented Mar 28, 2022

stone-z commented Apr 21, 2022

philippeckel commented Jun 15, 2022

stone-z commented Jun 15, 2022

philippeckel commented Jun 15, 2022

stone-z commented Jun 23, 2022

Old metrics still visible #48

Old metrics still visible #48

Comments

komljen commented Jan 18, 2022

stone-z commented Jan 19, 2022

komljen commented Jan 19, 2022

stone-z commented Jan 19, 2022

stone-z commented Feb 14, 2022

akosveres commented Feb 28, 2022 • edited

stone-z commented Mar 28, 2022

stone-z commented Apr 21, 2022

philippeckel commented Jun 15, 2022

stone-z commented Jun 15, 2022

philippeckel commented Jun 15, 2022

stone-z commented Jun 23, 2022

akosveres commented Feb 28, 2022 •

edited