Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {Gauge,Summary,Counter}Definitions to PrometheusOpts to define non-expiring metrics #120

Merged
merged 9 commits into from Nov 4, 2020

Conversation

mkcp
Copy link
Contributor

@mkcp mkcp commented Oct 28, 2020

This PR introduces structs for Gauge, Summary, and Counter Definitions that can be provided to the PrometheusSink via its PrometheusOpts. The PrometheusSink will uses these definition slices to declare metrics when the process starts, and marks them internally to not be deleted when their last-updated time passes the configured expiry. Instead, if canDelete is false, the PrometheusSink expires the metric by setting its value to NaN. When metrics aren't defined, they should behave as they always have before this PR.

Allowing go-metrics users to define static metrics in prometheus lets us resolve a long-requested pain point with how Consul expires metrics it has not observed within its prometheus_retention_time, which we set PrometheusSink's expiry value with. Workarounds for this have led to users configuring retention times on the order of days, weeks, months, or longer, leading to a pathological loss of precision in metrics which do not update often. Not all of these metrics remain valid even if we haven't updated them. With this change we can recommend much shorter retention times which will lead to more accurate measurements and fewer bogus stats.

As a secondary benefit, we can provide prometheus users better UX by attaching help string documentation to the metrics, as prometheus users expect.

This is a successor PR to the approach in #119 and allows ephemeral metrics in HashiCorp products to continue to exist where short-lived metrics would otherwise stop products from adopting the previous NaNExpiry behavior wholesale. The definitions also provide a similar "define your metrics before you use them" approach that's used in Prometheus clients.

More research needs to be done on how to best handle definitions for sub-libraries using go-metrics which expect their configuration from their hosting application. One path offered by this PR is to declare definitions inside the library and have the hosting application "reach into" the lib and append its definitions to the application's when it creates its PrometheusSink. A alternative approach has been proposed which is compatible with this definitions PR, and was one of the early iterations on this bug fix: allow prometheus metrics to be "defined" at runtime with additional methods for each metric type in go-metrics. For one example, InitCounterWithLabels.

Copy link
Contributor

@cgbaker cgbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will go ahead and approve; one question about initializing constant gauges and summaries.

}

type PrometheusSummary struct {
type summary struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these classes were exported before (even though they didn't need to be), but i don't find any indication that they're used anywhere, so 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we looked over it and it seemed pretty safe to unexport these because they're only used internally.

Help: g.Help,
ConstLabels: prometheusLabels(g.ConstLabels),
})
pG.Set(float64(0)) // Initialize at zero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should gauges and summaries (which haven't been observed yet) be initialized to zero or NaN (as they do after expiration)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question - the initialization part here is a little redundant for counters and gauges because they start at zero in the go prometheus lib anyway. However, initializing summaries to 0 does create an artifact when the process starts and before they're emitted, where the line sits on the 0 like it's a counter or gauge. It's going to be better to delete all of these inits and just use the go prometheus client's defaults.

@cgbaker cgbaker merged commit 6fd5a4d into hashicorp:master Nov 4, 2020
@mkcp mkcp deleted the mkcp/add-static-metrics branch November 4, 2020 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants