Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric aggregations never seem to reset, causing memory leaks and grpc message size ResourceExhaused #4096

Closed
ahobson opened this issue May 17, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@ahobson
Copy link

ahobson commented May 17, 2023

Description

When running with metrics enabled and using a PeriodicReader, memory usage seems to grow continuously until metrics are no longer sent with an error like

2023/05/17 15:46:33 opentelemetry error: failed to upload metrics: context deadline exceeded: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5528731 vs. 4194304)

Environment

  • OS: [linux, macOS]
  • Architecture: [x86, arm64]
  • Go Version: [1.20.3
  • opentelemetry-go version: [v1.16.0-rc.1]

Steps To Reproduce

  1. git clone git@github.com:ahobson/opentelemetry-go-contrib.git
  2. cd opentelemetry-go-contrib
  3. git checkout adh-periodic-reader-leak
  4. cd instrumentation/net/http/otelhttp/example
  5. go run server/server.go &
  6. for x in {1..1000}; do curl -sSf -H "User-Agent: $(date +curl-%s)" http://localhost:7777/hello > /dev/null; done

Notice that the metrics reported to stdout will include 1000 user agents in perpetuity. I would have expected the aggregation to be reset after each collection period.

If you have a GRPC collector you can run locally, running with

GRPC_ENDPOINT=localhost:4317 go run server/server.go

If you make enough requests you will eventually see the ResourceExhausted message above.

Expected behavior

At the very least, the memory usage should not grow without bounds and the aggregations should not grow until the messages can no longer be sent.

@ahobson ahobson added the bug Something isn't working label May 17, 2023
@ahobson
Copy link
Author

ahobson commented May 22, 2023

Ok, so it looks like this is something that can be fixed by the user, so I think I'll close this bug report.

One option is to use a view to filter attributes as seen at open-telemetry/opentelemetry-go-contrib#3071 (comment)

I believe another option would be configure a DeltaTemporality

I do want to say I'm disappointed that the defaults seem to have some pretty large footguns attached. I would personally prefer the defaults to be ones that would not exhaust memory by default, even if that means some metrics are lost.

@ahobson ahobson closed this as completed May 22, 2023
@xsteadfastx
Copy link

Ok, so it looks like this is something that can be fixed by the user, so I think I'll close this bug report.

One option is to use a view to filter attributes as seen at open-telemetry/opentelemetry-go-contrib#3071 (comment)

I believe another option would be configure a DeltaTemporality

I do want to say I'm disappointed that the defaults seem to have some pretty large footguns attached. I would personally prefer the defaults to be ones that would not exhaust memory by default, even if that means some metrics are lost.

i totally run into this too. i added opentelemetry and memory is all over the place. how can i use DeltaTemporality? any hints for me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants