Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect sum aggregations with recording rule #13903

Open
kul opened this issue Apr 8, 2024 · 2 comments
Open

Incorrect sum aggregations with recording rule #13903

kul opened this issue Apr 8, 2024 · 2 comments

Comments

@kul
Copy link

kul commented Apr 8, 2024

What did you do?

Applied following addition rules,

  - name: agg-recording-rules
    groups:
      - name: agg-rules
        rules:
          - record: my_requests_drop_opt0_total
            expr: sum without (colo, instance, job, prometheus, service) (my_requests_drop_total)

What did you expect to see?

I was expecting more or less the same rate and increase calculation over both original and recorded metric.

What did you see instead? Under which circumstances?

With less load (i.e. instances emitting the metrics) , the rate and increase was in line. For example the cardinality of the original metrics (my_requests_drop_total) at a point in time typically were 38k.

However with increased load i.e cardinality of 200k , the aggregation rules gave strange results eg. here is how the rate looked
image
image

System information

No response

Prometheus version

kube-prometheus-stack-35.5.78
Prometheus: 2.35.0 

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

@kul kul changed the title Aggregate rule is causing incorrect sum aggregations Incorrect sum aggregations with recording rule Apr 8, 2024
@prymitive
Copy link
Contributor

I was expecting more or less the same rate and increase calculation over both original and recorded metric.

Your assumption is wrong. Summing together counters loses information about counter resets, so it cannot be accurate.

@prymitive
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants