Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The No metric name label error doesn't specify metric #5802

Open
jakubgs opened this issue Mar 6, 2024 · 3 comments
Open

The No metric name label error doesn't specify metric #5802

jakubgs opened this issue Mar 6, 2024 · 3 comments

Comments

@jakubgs
Copy link
Contributor

jakubgs commented Mar 6, 2024

Describe the bug
I have started seeing this error on our distributors:

caller=logging.go:86
level=warn
msg="
  POST /api/v1/push (500) 81.24787ms
  Response: \"No metric name label\\n\"

Which comes from: https://github.com/cortexproject/cortex/blob/v1.16.0/pkg/util/extract/extract.go#L13C37-L13C57

I have not yet identified what is causing it because the error does not show which metric caused it. This means I have to find it essentially through trial-and-error by removing services one-by-one and hope I can identify which one is causing it

Expected behavior
The error indicates which metric caused it, allowing the administrator to fix the metric.

Environment:
Prometheus 2.50.1 sending to Cortex 1.16.0.

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 6, 2024

Also, notably, I have stopped a Prometheus instance I suspected was causing this and the errors indeed stopped, but then after I've restarted that Prometheus instance the errors did not return, which makes no sense to me:

image

@yeya24
Copy link
Collaborator

yeya24 commented Mar 6, 2024

Have you tried querying your Prometheus and find series with no metric name?

The error indicates which metric caused it, allowing the administrator to fix the metric.

I think this action item is reasonable. We can try to add this. Help wanted.

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 7, 2024

If you look at my comment here:

What I've experienced is No metric name label errors when my cluster was near to dying due to network traffic issues.
After restarting Prometheus instances the error would go away for no good reason. I'm not sure what's happening but it seems like high stress situations trigger something that results in those errors. Maybe seeing the metric that causes it in the error could lead to the reason why it only happens during high cluster latency situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants