Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add NaNExpiry option and behavior to PrometheusSink #119

Closed

Conversation

mkcp
Copy link
Contributor

@mkcp mkcp commented Oct 5, 2020

Also adds doccomments to structs.

The reasoning for this change is quoted from the PrometheusOpts docs:

NanExpire is a flag that allows us to "zero" out gauges and summaries that we have not updated in the expiry interval. We observe that they do not have a number rather than claiming the number is zero. This is useful because Prometheus expects that metrics are always present. With NaNExpire we may initialize metrics before they're first observed and continue to offer some value (NaN) to Prometheus scrapers even if we have not observed a recent value. Counters are simply collected with the last known value because it is safe to do so. When this flag is combined with "initializing" each metric at zero or NaN when the process starts, it addresses the problem of metrics "disappearing" from Prometheus.

In short, this allows us to set a config flag that prevents metrics from being deleted when they expire. Instead they are set to NaN or ignored if we can safely assume their last value.

This new behavior and config allows us to resolve a long-requested pain point with how Consul expires metrics it has not observed within its prometheus_retention_time, which we set PrometheusSink's expiry value with. Workarounds for this have led to users configuring retention times on the order of days, weeks, and months leading to a pathological loss of precision in metrics which do not update often. Not all of these metrics remain valid even if we haven't updated them. With this change we can recommend much shorter retention times which will lead to more accurate measurements and fewer bogus stats.

@mkcp
Copy link
Contributor Author

mkcp commented Oct 12, 2020

Please hold off on reviewing and/or merging this - I've put together an alternate solution that solves the initialization problem as well as the retention and NaN-expiry problem this one PR solves. I'm getting some more feedback before submitting it up as a PR.

@mkcp
Copy link
Contributor Author

mkcp commented Oct 28, 2020

Closed in favor of the approach in #120

@mkcp mkcp closed this Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant