New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cardinality metric observed from grpc instrumentation #7517
Comments
That would be a breaking change for users, and we would be removing support for something that (AIUI) is in the spec. I think this should be raised at the spec level if it has not been already. On the Collector, I guess the right solution for this would be to support defining views for the metrics, but this seems like something that would require careful design and take significant time. @codeboten, what do you think is the right approach here? Do we have a clear schema for configuring views in YAML (e.g. from the Configuration WG)? |
@mx-psi views are part of the initial example. i agree that it is likely to take some time before it is fully implemented. Since this is a problem only for the otel instrumentation, would an acceptable interim solution be to configure a view in the SDK to drop problematic metrics? Even if users don't have an option to re-enable them? I suppose there could be a "enablePotentiallyHighCardinalityMetrics" feature gate. |
That sounds like an acceptable solution for me in the short term |
**Description:** Puts the grpc meter provider behind a feature flag for controlling high cardinality metrics. **Link to tracking Issue:** #7517
**Description:** Puts the grpc meter provider behind a feature flag for controlling high cardinality metrics. **Link to tracking Issue:** open-telemetry#7517
this was closed by #7543 |
Describe the bug
When the flag
telemetry.useOtelForInternalMetrics
is enabled, metrics for gRPC now come through because of this line. When running a collector with a sufficiently high number of connections, this floods the prometheus exporter with a metric withnet.sock.peer.addr
attributes. This resulted in a prometheus scrape being a 33 MB file.This issue has been reported and discussed here. I'm moving the discussion here because this is going to cause any user enabling this flag to potentially experience this cardinality explosion. Temporarily, it would be ideal if we could disable the line I linked to remediate this problem.
Steps to reproduce
Create a collector with the
telemetry.useOtelForInternalMetrics
flag enabled, load test it, curl the metrics endpoint.What did you expect to see?
A stable collector that doesn't constantly OOM.
What did you see instead?
Cardinality explosion
What version did you use?
v0.74.0
What config did you use?
Config:
Environment
OS: Kubernetes
Compiler(if manually compiled): go 1.20
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: