Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowing to keep invalid counter names (with no _total suffix) #653

Closed
wants to merge 1 commit into from

Conversation

ndimitry
Copy link

@ndimitry ndimitry commented Apr 19, 2021

In some cases keeping broken counter names (with no _total suffix) will be really helpful. For example when old names are documented and are in use with a lot of dashboards being precreated for them. Or in case it is sugnificant not to loose connection to old data when counters are renamed.

Possibly this might be a grace option allowing some grace period (for a year or two) for the teams to move to the new names. For us this is quite an effort shared between several development teams. And we are afraid that OpenMetrics format might change too because it is still an RFC draft. When OpenMetrics are becoming an RFC this will be steady argument for us to move to this format.

Links to the discussions #640, #649

…ses when it is hard to straightaway move to the OpenMetrics format

Signed-off-by: dnazaruk <dmitrii.nazaruk@t-systems.com>
@ndimitry
Copy link
Author

@fstab please, review

@fstab
Copy link
Member

fstab commented May 24, 2021

Hi, I finally got back to this, and I have a few points:

  • The property only affects the text format 004, not the OpenMetrics format. This is good, because in OpenMetrics a counter without the _total prefix would be invalid. However, as far as I know there is no way to make a new Prometheus server release use the old format, as the Accept header is hard-coded here https://github.com/prometheus/prometheus/blob/03b354d4d9386e4b3bfbcd45da4bb58b182051a5/scrape/scrape.go#L686-L686. As soon as you switch to a new version of Prometheus you will have the _total suffix again.
  • The change removes _total from all counters, even if there is a counter explicitly named _total.

Instead of starting the monitored application with -Dio.prometheus.allowInvalidCounterNames=true you could achieve the same effect by adding the following to your Prometheus scrape config:

   metric_relabel_configs:
    - source_labels: [__name__]
      regex:  '(.*)_total'
      target_label: __name__
      replacement: '${1}'

Given the new issues this PR would introduce (result depends on the Prometheus version, and it is no longer possible to name a counter ..._total) I think it might be better to use a relabel config as a temporary fix until dashboards are adapted.

@ndimitry
Copy link
Author

@fstab thank you for the possible workaround description!

I guess, let's have this pull request closed then...

@ndimitry ndimitry closed this Jun 11, 2021
@franck102
Copy link

Is there any way this can be reopened?
The workaround is completely useless in a complex setup where half the clients a Go clients that have always reported metric names with _total suffixes, and the other half are java clients.
We have 200 jobs managed by 10-15 different teams, sorting out where the relabel_config is or isn't required is simply completely unmanageable.

@fstab
Copy link
Member

fstab commented Oct 27, 2021

There is no need to update the client_java library. You can safely continue using the pre-OpenMetrics version, and once you have time to update your Prometheus queries and add the _total suffix for counters you can switch to a newer version.

@franck102
Copy link

franck102 commented Oct 27, 2021

This is what we are doing for now, but that won’t work long-term. Spring Boot actuator 3.5.x for example already requires the 0.10.0 client library, and we have.a heavy dependency on Spring Boot.

Also note that this isn’t just about a few Grafana dashboards: we use remote write to feed promscale and populate a Timescale database; and alerts to feed a ServiceNow ticket management system.

That change breaks the entire ecosystem.

@travisspencer
Copy link

I agree with @franck102. This should be reopened. Upgrading to 0.12 was required to get Jakarta servlet support (cf. #647), but this breaking change came in at the same time. Now, we're blocked -- we can't upgrade Jetty to v. 11 because that requires Jakarta not javax servlets, but upgrading this lib to work with the new Jetty version brings in this breaking change.

What I don't understand is how the resource representation changes if the client keeps asking for text/plain; version=0.0.4. If my client keeps negotiating that format, why does it get anything different? Seems it's not really 0.0.4 format any more (or the version is unstable).

Help in resolving this conundrum is much appreciated.

@travisspencer
Copy link

@fstab , can you or one of the other maintainers of this project comment on my previous input? WDYT?

@fstab
Copy link
Member

fstab commented Dec 12, 2021

Setting io.prometheus.allowInvalidCounterNames=true will remove the _total suffix from all counters, even if there is a counter explicitly named _total. So if you now have an issue with _total suffixes being added to all counters, with this PR you might run into an issue because _total suffixes are removed from all counters.

@travisspencer
Copy link

That sounds like it could still produce a breaking change, @fstab , if I have counters named ..._total already. I will check this, but would like to know which property I need to set or how I can use this library to make it work as it did in the previous version. If that's not doable, can the Jakarta support be back-ported?

@fstab
Copy link
Member

fstab commented Dec 13, 2021

No, I am talking about this PR here: If we merge this PR, then we'll get a property io.prometheus.allowInvalidCounterNames that would remove all _total suffixes from counters. That's why I don't want to re-open this.

@travisspencer
Copy link

Ah, I see your point, @fstab. I think then what should be reopened is #640. (I should have made my comment there and not on this PR, it seems.) Would you agree with reopening that issue?

@JnRouvignac
Copy link

JnRouvignac commented Aug 18, 2022

I agree that this PR may not be the right way to restore the old behaviour with respect to counter metric names.

I could start working on a new PR that adds a io.prometheus.allowInvalidCounterNames system property to restore the older behaviour.
Would you agree with merging such a PR @fstab if it satisfied you?

Thank you

@JnRouvignac
Copy link

I have noticed #791 which adds PROMETHEUS_DISABLE_CREATED_SERIES, so I think the answer is yes.
I'll look into doing a PR to allow smoother migration.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants