Replies: 1 comment 1 reply
-
Prometheus itself expects samples to exist for series more frequently than 5 minutes. I can't remember the exact timeout, but you probably are getting stale markers. I'd just evaluate those recording rules more frequently. Regarding the alert itself, the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What did you do?
We have configured remote write to send a subset of our metrics to a remote, using a single source_labels with a regex.
The regex matches recording rules that run on intervals of 5m.
What did you expect to see?
We expect data to be written to the remote every five minutes, with no
PrometheusRemoteWriteBehind
fired.What did you see instead? Under which circumstances?
While data is written to the remote correctly, we observe that the expression for alert
PrometheusRemoteWriteBehind
returns increasing values until there is data to send.When our recording rules return data, the graph for the alert expression is as follows:
This doesn’t cause the alert to fire, but it looks incorrect. It’s saying that something is getting older in the queue that will never be sent. Every five minutes, when the recording rules run, the value of the expression drops and then starts building again.
Sometimes our recording rules do not return data. They are based on customer requests and sometimes there aren’t any. In this case, the expression for the alert grows indefinitely and the alert does fire:
An effective workaround has been to include an additional metric in the regex that is incremented more frequently:
Is the observed behavior expected? Should our recording rules be changed to always return a time series, eg. by reworking them to (existing-expression) OR vector(0)?
Environment
System information:
Linux 5.4.149-73.259.amzn2.x86_64 x86_64
Prometheus version:
Two errors, may not be related:
Beta Was this translation helpful? Give feedback.
All reactions