Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling large number of goroutines #748

Open
vaind opened this issue Nov 20, 2023 · 1 comment
Open

Profiling large number of goroutines #748

vaind opened this issue Nov 20, 2023 · 1 comment
Assignees

Comments

@vaind
Copy link
Collaborator

vaind commented Nov 20, 2023

I've created a simple sample application with 5000 concurrent goroutines, tuned to have around 30 % CPU usage (on my desktop PC) without profiling. After profiling was enabled with 5 % sample rate, the CPU usage jumps to around 40 %.

I've run a pprof CPU profile with and without profiling enabled. In the following SVG, you can see that the vast majority of the overhead is coming from calling Runtime.stack(). Current profiling implementation calls it periodically to collect stacks for all goroutines. Therefore, the time complexity increases with the number of active goroutines.
pprof005

I'm investigating how we could improve this.

Note: runtime.GoroutineProfile() has lower overhead, possibly because it doesn't produce string output, but it currently doesn't have a goroutine ID so we can't map stacks across multiple calls to a single routine, see golang/go#59663

@vaind vaind self-assigned this Nov 20, 2023
@getsentry getsentry deleted a comment from github-actions bot Dec 12, 2023
@vaind
Copy link
Collaborator Author

vaind commented Jan 20, 2024

With the current tooling we have available from Go runtime and in combination with what Sentry.io expects to receive, I don't see a way to improve profiling for apps with a very large number of goroutines.
If we could make Sentry accept pprof (which only has aggregate information about samples), we could use pprof.StartCPUProfile(). This, however, would not yield the same experience as with other SDKs because we wouldn't have time-based samples anymore. A big downside to this would be that adding another format would introduce future maintenance issues when trying to keep it working alongside the "mainstream" sampling profiler.

As such, I think we may want to make some adjustments to automatically disable the profiler when it encounters a given number of goroutines. This would act as a reasonable default behaviour working around this limitation before it's resolved in one way or another. Users could be able to change this configuration or disable it completely if they want to run the profiler unhindered, regardless of the number of routines they launch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants