New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promethues counter decreases by 1 for some time series data #13950
Comments
This is unlikely to be a bug in Prometheus but most likely problem on your end. |
@prymitive Is there a way to handle this as I would need 2 statefulsets of promethues, Thanos does take care of deduplication but this delay might be difficult to manage right? |
Handle what exactly? |
@prymitive Example : These are the label for my metrics
On prom-0, i have this a value On thanos querier : Scrape duration on these endpoint is less than 0.1 sec as well |
If you use thanos and that’s where you see this problem then maybe thanos is merging two counters from two different Prometheus servers into a single time series? |
@prymitive I am already adding global external labels as |
Indeed 20ms come from two different Prometheis servers. It looks like a configuration issue on the Thanos side. In your last comment you have a typo: promethues_replica , is it like that in your config too? |
What did you do?
I noticed lately a huge spike in one of our metrics.
If you look at the highlighted value 7756564 at epoch 1712719298.819 and the new entry has 1 value less than the previous one. this is the reason of the spike in rate/increase function
There was no restart on prometheus or the target in this case. What can contribute to this dip in value?
Below is graph of the data for 2 week
Here is the screenshot of the spike
What did you expect to see?
I would expect not to see a decrease in counter
What did you see instead? Under which circumstances?
We are running a HA setup of promethues (2 stateful set) with thanos.
System information
Linux 5.10.192-183.736.amzn2.x86_64 x86_64
Prometheus version
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
The text was updated successfully, but these errors were encountered: