Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old metrics still visible #48

Closed
komljen opened this issue Jan 18, 2022 · 11 comments
Closed

Old metrics still visible #48

komljen opened this issue Jan 18, 2022 · 11 comments

Comments

@komljen
Copy link
Contributor

komljen commented Jan 18, 2022

I'm using the starboard feature described here https://github.com/giantswarm/starboard-exporter#one-vulnerabilityreport-per-deployment, and even though I don't see old reports anymore with kubectl CLI:

kubectl get vulnerabilityreport -n gradle-enterprise
NAME                                                      REPOSITORY                                               TAG        SCANNER   AGE
replicaset-5c8b5d8449                                     gradleenterprise/gradle-enterprise-operator-image        2021.4.1   Trivy     82m
replicaset-5cf45f8fd7                                     gradleenterprise/gradle-build-cache-node-image           2021.4.1   Trivy     82m
replicaset-764c4bd49c                                     gradleenterprise/gradle-test-distribution-broker-image   2021.4.1   Trivy     82m
replicaset-gradle-database-5b89d7b595-database            gradleenterprise/gradle-database-image                   2021.4.1   Trivy     82m
replicaset-gradle-database-5b89d7b595-database-tasks      gradleenterprise/gradle-database-image                   2021.4.1   Trivy     82m
replicaset-gradle-metrics-64c7565799-gradle-metrics       gradleenterprise/gradle-metrics-image                    2021.4.1   Trivy     82m
statefulset-gradle-enterprise-app-gradle-enterprise-app   gradleenterprise/gradle-enterprise-app-image             2021.4.1   Trivy     148m
statefulset-gradle-keycloak-gradle-keycloak               gradleenterprise/gradle-keycloak-image                   2021.4.1   Trivy     144m
statefulset-gradle-proxy-gradle-proxy                     gradleenterprise/gradle-proxy-image                      2021.4.1   Trivy     150m

If I go to the metrics endpoint on starboard exporter, I still see metrics like (notice the image tag version):

starboard_exporter_vulnerabilityreport_image_vulnerability{image_namespace="gradle-enterprise",image_repository="gradleenterprise/gradle-keycloak-image",image_tag="2021.4",report_name="statefulset-gradle-keycloak-gradle-keycloak",vulnerability_id="CVE-2021-30129"} 6.5

I guess this is because the report name is not unique in this case, like with replica sets?

@stone-z
Copy link
Contributor

stone-z commented Jan 19, 2022

Hey @komljen, can you describe a bit more how it got to the current state? Specifically:

  • starboard v0.14.0 isn't released yet, are you using an rc version?
  • do you see both old and new metrics or only old?
  • can you provide the output of kubectl describe vulnerabilityreport -n gradle-enterprise statefulset-gradle-keycloak-gradle-keycloak?
  • how did you update the statefulset / are you able to provide the vulnerabilityreport of the previous version (the one being reported by the metric)?

My theory is that if an existing VulnerabilityReport is updated, the exporter isn't clearing the old metrics like it would on a deletion

@komljen
Copy link
Contributor Author

komljen commented Jan 19, 2022

Yeah, I'm using the RC version and both old and new metrics were available. The stateful set is updated via Helm upgrade, and the previous vulnerability report just got overwritten. When describing it I see new values in there only.

I have the same theory. I had to restart the starboard exporter pod to get rid of old metrics. Not sure if there is anything else that we could do in this scenario.

@stone-z
Copy link
Contributor

stone-z commented Jan 19, 2022

We likely have to include some diff logic and clear metrics for the previous version if the CR is just updated. Roughly in here.

Thanks for bringing it to my attention, I'll take a deeper look and come up with something

@stone-z
Copy link
Contributor

stone-z commented Feb 14, 2022

So just to keep this issue up to date, I've looked a bit more into this and don't see a great way forward yet.

The problem is that once the report object is updated, we can't reconstruct the vector needed to clear the prometheus metric. I've added a note to this prometheus client issue which would make it possible to clear the old metrics.

Aside from official prometheus support, current options seem to be limited to either storing a copy of all those label vectors per-report (which I'd rather not do), clearing ALL metrics (so there would be weird periodic drops in metrics and dashboards, potentially re-triggering alerts, etc.), or just restarting the exporter (which is inelegant).

TL;DR - I want to fix this but don't see how quite yet. A workaround is to periodically restart the exporter. Open to suggestions

@akosveres
Copy link

akosveres commented Feb 28, 2022

I believe the solution is to drop the labels for a specific metric, at least that's what I've done on a previous project which is similar to starboard. My metric looked like this:

	vulnMetrics = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Name: "scan_vulnerabilities",
			Help: "Number of vulnerabilities found, reporting container image and vulnerability severity",
		},
		[]string{
			// Container image scanned
			"container",
			// Severity of the vulnerability
			"severity",
		},
	)

Then I was able to do :

func removeObsoleteContainers(list []string) (err error) {
	severities := []string{"CRITICAL", "HIGH", "LOW", "MEDIUM", "UNKNOWN"}
	for _, image := range list {
		for _, sev := range severities {
			ok := vulnMetrics.Delete(prometheus.Labels{"container": image, "severity": sev})
			if ok != true {
				err = fmt.Errorf("Couldn't delete metrics with image label %s, severity %s", image, sev)
			}
		}

	}
	return
}

If a label can not be removed for a metric, then an error is thrown, which should be fine. Of course, there may be better ways of doing this. I'm only adding this information as it may be interesting.

Based on

The problem is that once the report object is updated, we can't reconstruct the vector needed to clear the prometheus metric.

this may not be possible.

@stone-z
Copy link
Contributor

stone-z commented Mar 28, 2022

Deletion based on partial matches is not currently supported by the Go prometheus client, so I've opened prometheus/client_golang#1013 to add this capability

@stone-z
Copy link
Contributor

stone-z commented Apr 21, 2022

Update: The upstream fix has been merged, so just waiting on a new release and we can resolve this

@philippeckel
Copy link

@stone-z thanks a lot! So this is fixed now but just waiting for a new release?

@stone-z
Copy link
Contributor

stone-z commented Jun 15, 2022

@philippeckel correct, I've merged a temporary fix to starboard-exporter while we wait for a prometheus client release. My colleague is working on support for CIS benchmarks (kube-bench) and then we will do a new exporter release which should resolve this issue.

@philippeckel
Copy link

@stone-z awesome, thanks a lot!

@stone-z
Copy link
Contributor

stone-z commented Jun 23, 2022

The fix for this has been released with v0.5.0

@stone-z stone-z closed this as completed Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants