Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable deletion of metrics from a vector based on partially specified labels #834

Closed
beorn7 opened this issue Feb 4, 2021 · 10 comments
Closed

Comments

@beorn7
Copy link
Member

beorn7 commented Feb 4, 2021

In most cases, users of a metric vector are well aware of the label combinations for which metric children have been created in the vector. And they should be, because unpredictable label values often go along with unbounded cardinality. However, there are legitimate use cases where tracking the used label names is hard. (This could merely be a code maintenance problem, e.g. a HTTP counter partitioned by endpoints in a situation where other parts of the codebase can add endpoints.)

While those cases exist, it's also rarely necessary to track the used label names. However, once you are in a situation where you want to delete metric children, you have to know them by their full label set. You cannot just say "delete all children with tenant="42" plus an unknown set of other labels. (Cortex instrumentation recently ran into this problem when a tenant gets deleted entirely, and all the metrics related to it should disappear.) This is essentially seeing the labeled metric children arranged in a tree in which you want to prune one whole branch.

Work-arounds include:

  1. Separately track the used label names to allow deletion in the current way (with a full label set). (Which is kind of what Cortex is doing in the case above, but it is brittle: if another label is added somewhere in the code, it could be easily missed in the deletion code.)
  2. Very hacky and weird: Call Collect on the vector and then Write on the collected Metric instances. Then look at the resulting protobuf message to find the metric children to delete.
  3. Probably the most elegant with the current feature set: Have separate vectors for all the collectors for a single tenant, with the tenant label as a ConstLabel. Those collectors can then be separately registered and de-registered upon creation and removal of a tenant.

Note that it is not possible to use vector currying and then call Reset an a curried vector because Reset always acts on the underlying vector as a whole.

I assume the use case, while legitimate, is still fairly niche, so I would like to avoid adding yet another method (like DeletePartiallyMatching(labels)) to the already quite fat method set of vectors. In hindsight, applying the currying to Reset would have been a good way to enable this use-case, but now it's too late to change the behavior. Similarly, Delete(labels) could have been implemented to work on a partial match. Most users probably wouldn't mind to change the behavior now, but given that github.com/prometheus/client_golang/prometheus is in the top30 of most used Go packages, there are probably a few users around who rely on the current behavior in some weird usage pattern. I was wondering if it would be a good idea to introduce some "wildcard" pattern, similar to the ... we have already used in Alertmanager grouping so that you could write something like myVec.Delete(prometheus.Labels{"tenant": "42", "...": ""}). (The label value of the ... pseudo label wouldn't matter.) It's a bit arcane, but it is for a niche use case after all. It should also be noted that the current deletion is optimized to be fast, while the deletion by partial match will require a full scan through all metric children (which could become a problem in those rare cases where a vector has a lot of children).

Before doing anything, I'd like to get some feedback from the crowd. What's your preference?

  1. Do nothing, the work-arounds listed above are good enough.
  2. Allow a magic "wildcard label name" like ... in the labels map used for Delete.
  3. Add a new method like DeletePartialMatch, which will be added to all vector types.

@pracucci @pstibrany @bwplotka

@pstibrany
Copy link

Thank you for writing up this issue, and suggesting some workaround.

We have the same problem in cortexproject/cortex#3784, where we put user-controlled cluster into the label values without tracking it elsewhere, and using DeletePartialMatch (my preferred solution, it seems easiest to understand) or "wildcard label name" would work nicely.

Until that is available, some of your suggested workarounds may actually work fine for us (thinking about "hacky and weird" one).

@ewoutp
Copy link

ewoutp commented Jun 1, 2021

We need this feature as well.

My preferred approach would be a DeletePartialMatch function.

@beorn7
Copy link
Member Author

beorn7 commented Jun 2, 2021

I mostly assigned this to myself as a “default” because I used to be the maintainer of this repo. Therefore, I'm un-assigning this from myself no. New maintainers @bwplotka & @kakkoyun, please deal with this as you see fit.

@beorn7 beorn7 removed their assignment Jun 2, 2021
@taiyang-li
Copy link

I need this feature too!

@stone-z
Copy link
Contributor

stone-z commented Jan 21, 2022

Another vote for the DeletePartialMatch/DeletePartialMatchWith feature.

Use case for those interested We have an exporter which exposes values from some Kubernetes custom resources. An extremely simple conceptual example might be:
kind: SomeResource
metadata:
  name: my-resource
data:
  version: 1.2.3
  some_field: 99

The exposed gauge metric for that resource might look like:

exporter_resource_value{name="my-resource", version="1.2.3"} 99

Now suppose my-resource is updated to version: 1.3.0.

Our exporter subscribes to changes and deletions of the resource, but (due to Kubernetes client implementation details) we get only the name of the resource when it changes, not the complete original resource. So after our exporter has reconciled the update, we have the following metrics:

exporter_resource_value{name="my-resource", version="1.2.3"} 99
exporter_resource_value{name="my-resource", version="1.3.0"} 99

But the underlying resource for version 1.2.3 no longer exists.

This means

  • the exporter must apply a finalizer to block deletion the resource until we can delete metrics for it
  • we haven't yet found a way to handle updates in all cases. The "old" resource no longer exists from which to generate the labels needed to delete the old metric.

The ability to delete a metric by partial labels would simplify both of those situations.

By including the name of the resource, we could simply DeletePartialMatchWith(prometheus.Labels{"name": "my-resource"}) the "old" metric for that resource and expose the new one.

One way around this would be to have more direct instrumentation and report the state of all resources at scrape time, but my understanding is that is also an anti-pattern. I can imagine this limitation may also arise in other situations where the source of truth for the metric disappears

@stone-z
Copy link
Contributor

stone-z commented Mar 25, 2022

I've opened a PR adding this functionality here #1013, I have only so much time to iterate on it but would be open to suggestions

@bwplotka
Copy link
Member

I think I like the soluton proposed by @stone-z - can't see any harm in adding those. PTAL everyone for this (: (cc @kakkoyun )

@cstyan
Copy link
Member

cstyan commented Apr 14, 2022

Just ran into a situation where I need this. Nice to see a PR already open, thanks @stone-z and @bwplotka / @beorn7 for reviewing.

@beorn7
Copy link
Member Author

beorn7 commented Apr 26, 2022

Since #1013 is merged, we can close this, right?

@beorn7 beorn7 closed this as completed Apr 26, 2022
@stone-z
Copy link
Contributor

stone-z commented Apr 27, 2022

Yep this is done. Just waiting for release now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants