Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetricVec constructors do not initialize the underlying prometheus metrics #95115

Closed
dgrisonnet opened this issue Sep 28, 2020 · 10 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.

Comments

@dgrisonnet
Copy link
Member

What happened:

When creating a counter with component-base NewCounterVec and calling metric.Describe(ch), an unexpected segfault occurs.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x818f89]

goroutine 1 [running]:
github.com/prometheus/client_golang/prometheus.(*metricVec).Describe(...)
	/tmp/test/vendor/github.com/prometheus/client_golang/prometheus/vec.go:98
main.main()
	/tmp/test/main.go:17 +0xa9

What you expected to happen:

I was expecting the metric to be describable as when I create it with client_golang NewCounterVec.

How to reproduce it (as minimally and precisely as possible):

package main

import (
        "github.com/prometheus/client_golang/prometheus"
        "k8s.io/component-base/metrics"
)

func main() {
        ch := make(chan *prometheus.Desc, 1)
        counter := metrics.NewCounterVec(
                &metrics.CounterOpts{Name: "test", Help: "help"},
                []string{},                                                                                                                                                 
        )   
        counter.Describe(ch)
}

Anything else we need to know?:

This happens when creating all types of Vec; however, non-Vec metrics such as Counter are not affected by this segfault.

This issue seems to be related to the inner Prometheus metric initialization.
In the NewCounter function, setPrometheusCounter is called. Whereas, in NewCounterVec, the initializeMetric function is not called.

The segfault was left unnoticed because the metric initialization is also done during the metric registration. Thus, to reproduce it we need to call the Describe method before registering the metric.

Environment:

  • k8s.io/component-base v0.19.2

/sig instrumentation
/cc @serathius

@dgrisonnet dgrisonnet added the kind/bug Categorizes issue or PR as related to a bug. label Sep 28, 2020
@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Sep 28, 2020
@ehashman
Copy link
Member

ehashman commented Oct 7, 2020

/assign @brancz

@brancz
Copy link
Member

brancz commented Oct 8, 2020

@dgrisonnet do you want to give it a try to fix this?

@dgrisonnet
Copy link
Member Author

I can give it a spin but I might need some more insights.

For the record, I was able to fix my original problem in metrics-server by forcing the instantiation of the metric by calling the Create method before trying to collect it with testutil.CollectAndCompare. AFAIU, this seems to be the intended way to do it considering the lazy instantiation of the metrics. However, I don't think segfaults should occur in such cases.

I have two different ideas on how to fix this:

wdyt @brancz?

@serathius
Copy link
Contributor

/cc @RainbowMango @logicalhan
as they authored the metrics framework

@RainbowMango
Copy link
Member

The reason why we delay instantiation to the registry is we want to check if the metrics are deprecated or not.
If we want to instantiate it before the registry, how do we know the release version which holds in the registry for now?

I wonder why you need to describe a CounterVec before it is registered? Can you explain more about your cases?

@dgrisonnet
Copy link
Member Author

My use case was in metrics-server's unit tests when we moved from the global registry to the dependency injected ones. To test the metrics, we use the testutil.CollectAndCompare function from the metrics framework that is responsible for registering and collecting a given metric. However, this function is using the client_golang registration mechanism instead of the custom one from this framework. The problem is that the client_golang implementation is not initializing the underlying collector during registration. Thus, upon registration by CollectAndCompare, the underlying collector of the lazily instantiated metric isn't initialized which results in a segfault when client_golang tries to describe and gather the metric.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e0852]

goroutine 4 [running]:
github.com/prometheus/client_golang/prometheus.(*metricVec).Describe(0x0, 0xc00007a1e0)
        /tmp/test/vendor/github.com/prometheus/client_golang/prometheus/vec.go:98 +0x22
github.com/prometheus/client_golang/prometheus.(*Registry).Register.func1(0x7f1ac805c008, 0xc00018c300, 0xc00007a1e0)
        /tmp/test/vendor/github.com/prometheus/client_golang/prometheus/registry.go:275 +0x3b
created by github.com/prometheus/client_golang/prometheus.(*Registry).Register
        /tmp/test/vendor/github.com/prometheus/client_golang/prometheus/registry.go:274 +0x166

I solved this by forcing the creation of the metric before calling this function. However, I wonder if this framework's wrapper should be responsible for registering the metric properly.


If we want to instantiate it before the registry, how do we know the release version which holds in the registry for now?

I wasn't clear enough, but when I mentioned instantiating the underlying collector before registration I was referring to initializing it to a dummy metric. For example, I tried instantiating it to prometheus.NewCounterVec(prometheus.CounterOpts{}, []string{}) and it gave me the following error instead of the previous segfault:

registering collector failed: descriptor Desc{fqName: "", help: "", constLabels: {}, variableLabels: []} is invalid: "" is not a valid metric name

The goal would be to make the error easier to understand and avoid any segfaults related to the underlying collector not being instantiated.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 9, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Projects
None yet
Development

No branches or pull requests

7 participants