Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rayon filtering #225

Open
benbrittain opened this issue Dec 19, 2021 · 6 comments
Open

Rayon filtering #225

benbrittain opened this issue Dec 19, 2021 · 6 comments

Comments

@benbrittain
Copy link

I recently went to parallelize/optimize a library with rayon, and my flamegraph (unsurprisingly!) became absolutely unreadable. Optionally filtering out rayon symbols/functions would be a super helpful feature

@jonhoo
Copy link
Owner

jonhoo commented Dec 20, 2021

Can you give an example of what it looked like and what filtering you'd like to see?

@benbrittain
Copy link
Author

benbrittain commented Dec 20, 2021

I've got one here.
flamegraph

You can see that it's totally unreadable. I'd imagine filtering out anything with a rayon symbol and showing functions that one has control over more?

For now I've just defined the parallelism as a feature and have been profiling with that, but that seems less than ideal. I'm not sure what the state-of-the-art in highly-parallel flamegraphs look like though, this might just be infeasible.

@Nevsden
Copy link

Nevsden commented Dec 21, 2021

+1 on this issue. I have been working with flamegraph for the past few days and needed to go back from parallel iterators to iterators for any code I wanted to profile, because the graph with the rayon feature active is undecipherable.

Also the number of samples collected (I guess the number of stacks?) goes up approximately |CPU-core|-fold, when profiling parallelism with rayon.

@jonhoo
Copy link
Owner

jonhoo commented Dec 27, 2021

@benbrittain Hmm, I get a 404 for that link?

I wonder if you couldn't get quite far by using the new --skip-after flag to filter out the prefix of each stack that's just the rayon threading machinery. Maybe give that a try?

@benbrittain
Copy link
Author

benbrittain commented Dec 27, 2021

oops, you caught me in the middle of a web site overhaul. link here and updated above: flamegraph

@jonhoo
Copy link
Owner

jonhoo commented Dec 31, 2021

Woah, yeah, that's certainly a right mess!
This doesn't seem like something skip-after can fix, but it's also not clear that the flamegraph here is wrong. The reason for the insanity is that there are recursive calls to conjure::octree::Octree::subdivide, and the flamegraph is "correctly" showing you how long the calls at each "depth" is taking. Which gets pretty crazy as you can see, when you have a deep depth. I don't know exactly what should be displayed here, but there's an argument that a flamegraph isn't quite the right visual indication. For this particular use-case, I think you may want to do a bit of post-processing on the collapsed stack file before passing it to inferno-flamegraph to "collapse" the depths of the tree (if that's indeed what you want). It's such a specific case, that I don't think inferno itself can do much here — rather you should think about exactly what you want the output to show you, and tailor-make something for that for this data structure.

Now, that all said, there is one bit that's weird here which is the [unknown] at the bottom that causes the initial divergence into four stacks. You may want to try enabling forced frame pointers in your build, and see if that helps with that particular problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants