Enable deletion of metrics from a vector based on partially specified labels #834

beorn7 · 2021-02-04T15:10:43Z

In most cases, users of a metric vector are well aware of the label combinations for which metric children have been created in the vector. And they should be, because unpredictable label values often go along with unbounded cardinality. However, there are legitimate use cases where tracking the used label names is hard. (This could merely be a code maintenance problem, e.g. a HTTP counter partitioned by endpoints in a situation where other parts of the codebase can add endpoints.)

While those cases exist, it's also rarely necessary to track the used label names. However, once you are in a situation where you want to delete metric children, you have to know them by their full label set. You cannot just say "delete all children with tenant="42" plus an unknown set of other labels. (Cortex instrumentation recently ran into this problem when a tenant gets deleted entirely, and all the metrics related to it should disappear.) This is essentially seeing the labeled metric children arranged in a tree in which you want to prune one whole branch.

Work-arounds include:

Separately track the used label names to allow deletion in the current way (with a full label set). (Which is kind of what Cortex is doing in the case above, but it is brittle: if another label is added somewhere in the code, it could be easily missed in the deletion code.)
Very hacky and weird: Call Collect on the vector and then Write on the collected Metric instances. Then look at the resulting protobuf message to find the metric children to delete.
Probably the most elegant with the current feature set: Have separate vectors for all the collectors for a single tenant, with the tenant label as a ConstLabel. Those collectors can then be separately registered and de-registered upon creation and removal of a tenant.

Note that it is not possible to use vector currying and then call Reset an a curried vector because Reset always acts on the underlying vector as a whole.

I assume the use case, while legitimate, is still fairly niche, so I would like to avoid adding yet another method (like DeletePartiallyMatching(labels)) to the already quite fat method set of vectors. In hindsight, applying the currying to Reset would have been a good way to enable this use-case, but now it's too late to change the behavior. Similarly, Delete(labels) could have been implemented to work on a partial match. Most users probably wouldn't mind to change the behavior now, but given that github.com/prometheus/client_golang/prometheus is in the top30 of most used Go packages, there are probably a few users around who rely on the current behavior in some weird usage pattern. I was wondering if it would be a good idea to introduce some "wildcard" pattern, similar to the ... we have already used in Alertmanager grouping so that you could write something like myVec.Delete(prometheus.Labels{"tenant": "42", "...": ""}). (The label value of the ... pseudo label wouldn't matter.) It's a bit arcane, but it is for a niche use case after all. It should also be noted that the current deletion is optimized to be fast, while the deletion by partial match will require a full scan through all metric children (which could become a problem in those rare cases where a vector has a lot of children).

Before doing anything, I'd like to get some feedback from the crowd. What's your preference?

Do nothing, the work-arounds listed above are good enough.
Allow a magic "wildcard label name" like ... in the labels map used for Delete.
Add a new method like DeletePartialMatch, which will be added to all vector types.

@pracucci @pstibrany @bwplotka

The text was updated successfully, but these errors were encountered:

pstibrany · 2021-02-04T15:27:01Z

Thank you for writing up this issue, and suggesting some workaround.

We have the same problem in cortexproject/cortex#3784, where we put user-controlled cluster into the label values without tracking it elsewhere, and using DeletePartialMatch (my preferred solution, it seems easiest to understand) or "wildcard label name" would work nicely.

Until that is available, some of your suggested workarounds may actually work fine for us (thinking about "hacky and weird" one).

ewoutp · 2021-06-01T10:46:58Z

We need this feature as well.

My preferred approach would be a DeletePartialMatch function.

beorn7 · 2021-06-02T18:26:24Z

I mostly assigned this to myself as a “default” because I used to be the maintainer of this repo. Therefore, I'm un-assigning this from myself no. New maintainers @bwplotka & @kakkoyun, please deal with this as you see fit.

taiyang-li · 2021-08-24T03:44:53Z

I need this feature too!

stone-z · 2022-01-21T16:19:34Z

Another vote for the DeletePartialMatch/DeletePartialMatchWith feature.

Use case for those interested

We have an exporter which exposes values from some Kubernetes custom resources. An extremely simple conceptual example might be:

kind: SomeResource
metadata:
  name: my-resource
data:
  version: 1.2.3
  some_field: 99

The exposed gauge metric for that resource might look like:

exporter_resource_value{name="my-resource", version="1.2.3"} 99

Now suppose my-resource is updated to version: 1.3.0.

Our exporter subscribes to changes and deletions of the resource, but (due to Kubernetes client implementation details) we get only the name of the resource when it changes, not the complete original resource. So after our exporter has reconciled the update, we have the following metrics:

exporter_resource_value{name="my-resource", version="1.2.3"} 99
exporter_resource_value{name="my-resource", version="1.3.0"} 99

But the underlying resource for version 1.2.3 no longer exists.

This means

the exporter must apply a finalizer to block deletion the resource until we can delete metrics for it
we haven't yet found a way to handle updates in all cases. The "old" resource no longer exists from which to generate the labels needed to delete the old metric.

The ability to delete a metric by partial labels would simplify both of those situations.

By including the name of the resource, we could simply DeletePartialMatchWith(prometheus.Labels{"name": "my-resource"}) the "old" metric for that resource and expose the new one.

One way around this would be to have more direct instrumentation and report the state of all resources at scrape time, but my understanding is that is also an anti-pattern. I can imagine this limitation may also arise in other situations where the source of truth for the metric disappears

stone-z · 2022-03-25T19:15:28Z

I've opened a PR adding this functionality here #1013, I have only so much time to iterate on it but would be open to suggestions

bwplotka · 2022-03-27T18:22:56Z

I think I like the soluton proposed by @stone-z - can't see any harm in adding those. PTAL everyone for this (: (cc @kakkoyun )

cstyan · 2022-04-14T23:30:38Z

Just ran into a situation where I need this. Nice to see a PR already open, thanks @stone-z and @bwplotka / @beorn7 for reviewing.

beorn7 · 2022-04-26T16:17:58Z

Since #1013 is merged, we can close this, right?

stone-z · 2022-04-27T08:40:16Z

Yep this is done. Just waiting for release now

beorn7 added the feature request label Feb 4, 2021

beorn7 self-assigned this Feb 4, 2021

beorn7 mentioned this issue Feb 4, 2021

Cleanup tenant data in ingester cortexproject/cortex#3782

Merged

3 tasks

beorn7 removed their assignment Jun 2, 2021

stone-z mentioned this issue Feb 14, 2022

Old metrics still visible giantswarm/starboard-exporter#48

Closed

kakkoyun added the keep-open label Mar 16, 2022

stone-z mentioned this issue Mar 25, 2022

Implement deletion based on partially matching labels #1013

Merged

beorn7 closed this as completed Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable deletion of metrics from a vector based on partially specified labels #834

Enable deletion of metrics from a vector based on partially specified labels #834

beorn7 commented Feb 4, 2021

pstibrany commented Feb 4, 2021

ewoutp commented Jun 1, 2021

beorn7 commented Jun 2, 2021

taiyang-li commented Aug 24, 2021

stone-z commented Jan 21, 2022 •

edited

stone-z commented Mar 25, 2022

bwplotka commented Mar 27, 2022

cstyan commented Apr 14, 2022

beorn7 commented Apr 26, 2022

stone-z commented Apr 27, 2022

Enable deletion of metrics from a vector based on partially specified labels #834

Enable deletion of metrics from a vector based on partially specified labels #834

Comments

beorn7 commented Feb 4, 2021

pstibrany commented Feb 4, 2021

ewoutp commented Jun 1, 2021

beorn7 commented Jun 2, 2021

taiyang-li commented Aug 24, 2021

stone-z commented Jan 21, 2022 • edited

stone-z commented Mar 25, 2022

bwplotka commented Mar 27, 2022

cstyan commented Apr 14, 2022

beorn7 commented Apr 26, 2022

stone-z commented Apr 27, 2022

stone-z commented Jan 21, 2022 •

edited