Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Counter type metrics from statsdreceiver being dropped by prometheusexporter #7156

Closed
wants to merge 6 commits into from

Conversation

locmai
Copy link
Contributor

@locmai locmai commented Jan 13, 2022

Signed-off-by: Loc Mai lmai@axon.com

Description:

Fixing the bug with Counter type metrics from statsdreceiver being dropped by prometheusexporter

Previously, I fixed this on the statdreceiver side, however, it was off the specification of statsd. For this PR, I add the condition for the dropping if statement.

Assuming metric.Sum() would always be used for https://opentelemetry.io/docs/reference/specification/metrics/datamodel/#sums both Aggregation Temporality of delta or cumulative. But in Prometheus case, there would be no metric that would be non-cumulative AND also be monotonic at the same time.

Link to tracking Issue: #4153

@locmai locmai requested review from Aneurysm9 and a team as code owners January 13, 2022 10:31
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 13, 2022

CLA Signed

The committers are authorized under a signed CLA.

Signed-off-by: Loc Mai <lmai@axon.com>
// Drop metrics with non-cumulative aggregations
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative {
// Drop metrics with non-cumulative aggregations and is not monotonic increasing
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && !metric.Sum().IsMonotonic() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be handled elsewhere. Sums with delta temporality need to be converted to gauges rather than counters.

Copy link
Contributor Author

@locmai locmai Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for taking a look at this PR. Isn't Gauge type handled by accumulateGauge func?

func (a *lastValueAccumulator) accumulateGauge(metric pdata.Metric, il pdata.InstrumentationLibrary, now time.Time) (n int) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I saw that method and I was looking at the base data model and map:

So in Prometheus, counters are always monotonic and cumulative. But then I realized it's not a 100% mapped like that, some Sums could be converted back to Prometheus Gauge with convertSum()

	case pdata.MetricDataTypeGauge:
		return c.convertGauge(metric)
	case pdata.MetricDataTypeSum:
		return c.convertSum(metric)

So maybe we could go with this change below:

Suggested change
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && !metric.Sum().IsMonotonic() {
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityDelta {

With:

MetricAggregationTemporalityDelta is a MetricAggregationTemporality for a metric aggregator which reports changes since last report time.
MetricAggregationTemporalityCumulative is a MetricAggregationTemporality for a metric aggregator which reports changes since a fixed start time.

Prometheus exporter is at the end of the aggregation (it would set the metric to MetricAggregationTemporalityCumulative in the same function anyway) , I don't think it would care about the time which the metric got reported?

Copy link
Contributor Author

@locmai locmai Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds better, since the monotonic part will be used to check later on with convertSum():

func (c *collector) convertSum(metric pdata.Metric) (prometheus.Metric, error) {
ip := metric.Sum().DataPoints().At(0)
metricType := prometheus.GaugeValue
if metric.Sum().IsMonotonic() {
metricType = prometheus.CounterValue
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More inputs in #5714 (comment) from @jmacd , I'm reading through his suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize what I understand from this thread:

When Prometheus receives a Delta temporality counter, there are two cases: (a) monotonic, (b) non-monotonic.

For (a), the treatment can be applied inside Prometheus both easily and correctly. Since this module already keeps state on every metric, performing the delta-to-cumulative is simple. Just keep tallying the sum and it will be cumulative when Prometheus sees it.

For (b), I will speculate that no one cares and "correct" is difficult to achieve. These are non-monotonic deltas, which statsd already has a different way to treat. If the user sends gauges prefixed by + or - they have the same semantics as UpDownCounter, are to be interpreted as deltas. So, if the statsd user wanted delta-to-gauge, they already have a way to do that and it's the statsdreceiver's responsibility to compute it (and it does).

Continuing case (b), if the statsd user didn't want delta-to-gauge, then they're presumably sending you positive-and-negative increments. We have no information about reset time, so there are two correct ways to interpret this: (a) keep them as deltas, (b) convert them to rate gauges over some time period. Either way, Prometheus users will still have trouble working with this data, which is why I speculate that no one cares.

The preceding paragraph lends support to the following claim: Statsdreceiver should by default assume counters are monotonic, because there's a mechanism in statsd for treating non-monotonic counters as gauges already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you're proposing with respect to monotonic, delta temporality counters and agree that we could likely handle the delta-to-cumulative conversion here since this exporter is already maintaining state. Since https://github.com/open-telemetry/opentelemetry-specification/pull/2266/files#diff-0efae13f08f98e62a81767d5daeff37ebb7ef8c50537c7b9013e72506e9b055aR1085 would specify a different treatment for non-cumulative, monotonic OTLP sum data points being converted to prometheus exposition, do we need to change that or create an exception for the collector, or is there some scoping of that (proposed specification that I'm missing that makes it irrelevant to the collector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I will comment in open-telemetry/opentelemetry-specification#2266

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@locmai locmai Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the document https://opentelemetry.io/docs/reference/specification/metrics/datamodel/#sums-delta-to-cumulative that exactly aligned with what @jmacd said :D I believe it's 100% the right implementation. Let me see what I could do.

While OpenTelemetry (and some metric backends) allows both Delta and Cumulative sums to be reported, the timeseries model we target does not support delta counters. To this end, converting from delta to cumulative needs to be defined so that backends can use this mechanism.

Thanks again for all the great inputs here folks 🪨

@@ -2,6 +2,10 @@

## Unreleased

## 🧰 Bug fixes 🧰

- `statsdreceiver`: Fix counter metric being dropped by Prometheus exporter (#7156)
Copy link
Member

@dmitryax dmitryax Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why statsdreceiver? Looks like this change (#7156) is made on prometheus exporter only. Please update the line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, sorry for this one, I'm waiting to finalize the solution from the above discussion with @jmacd @Aneurysm9 so will update this later as well.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 5, 2022
@jmacd jmacd removed the Stale label Feb 11, 2022
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 26, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Mar 12, 2022
@bryanpaluch
Copy link

Any update on this? @locmai solution seems correct.

@locmai
Copy link
Contributor Author

locmai commented Apr 19, 2022

@bryanpaluch hi, I have discussed the right solution Joshua and we finalized it. I got side-tracked for a while and now I'm back working on it soon this sprint (we scheduled it as a work ticket).

@locmai
Copy link
Contributor Author

locmai commented May 10, 2022

Hi, I made a new PR (this one was using my own company account, so to reduce the steps needed, I made a new one): #9919

@locmai locmai closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants