perf(profiling): Tune the sample profile generation code for performance #1694

Zylphrex · 2022-10-19T22:16:10Z

We noticed that generating the sample format at the end of a profile can get rather slow and this aims to improve what we can here with minimal changes. A few things we took advantage of to accomplish this:

Turning the extracted stack into a tuple so it is hashable so it can be used as a dictionary key. This let's us check if the stack is indexed first, and skip indexing the frames again. This is especially effective in profiles where it's blocking on a network request for example, since there will be many identical stacks.
Using the hash of the stack as the dictionary key. Hashing the entire stack can be an expensive operation since a stack can have up to 128 frames. Using it as a dictionary key means it needs to be rehashed each time. To avoid this, we pre-hash the stack and use the hash as a dictionary key which is more efficient.
Convert numbers to strings ahead of time if we know have to. Values like the tid and elapsed since start ns needs to be sent as a string. However, many samples share the same value for it, and we're doing the conversion each time. Instead, we convert them to a string upfront and reuse it as needed in order to minimize unnecessary calculations.

Zylphrex · 2022-10-19T22:17:36Z

Some benchmarks I collected comparing the before and afters using a few arbitrary profiles.

samples: 439		stacks: 4326
old	avg of 100 runs	   21.9256ms
new	avg of 100 runs	    3.5280ms
samples: 507		stacks: 4227
old	avg of 100 runs	   18.8779ms
new	avg of 100 runs	    4.2562ms
samples: 1688		stacks: 15167
old	avg of 100 runs	   61.5474ms
new	avg of 100 runs	   14.6774ms
samples: 418		stacks: 3347
old	avg of 100 runs	   14.5421ms
new	avg of 100 runs	    2.5589ms
samples: 513		stacks: 2568
old	avg of 100 runs	   14.5687ms
new	avg of 100 runs	    2.9569ms
samples: 379		stacks: 758
old	avg of 100 runs	    5.7061ms
new	avg of 100 runs	    1.5271ms

From this testing, this looks to be consistently faster than what it used to be, and I have no reason to believe it's slower in any case.

antonpirker · 2022-10-20T07:28:21Z

Side note: The "daphne" errors above in the Django tests was fixed in master, to if you update your branches they should be gone.

We noticed that generating the sample format at the end of a profile can get rather slow and this aims to improve what we can here with minimal changes. A few things we took advantage of to accomplish this: - Turning the extracted stack into a tuple so it is hashable so it can be used as a dictionary key. This let's us check if the stack is indexed first, and skip indexing the frames again. This is especially effective in profiles where it's blocking on a network request for example, since there will be many identical stacks. - Using the hash of the stack as the dictionary key. Hashing the entire stack can be an expensive operation since a stack can have up to 128 frames. Using it as a dictionary key means it needs to be rehashed each time. To avoid this, we pre-hash the stack and use the hash as a dictionary key which is more efficient. - Convert numbers to strings ahead of time if we know have to. Values like the tid and elapsed since start ns needs to be sent as a string. However, many samples share the same value for it, and we're doing the conversion each time. Instead, we convert them to a string upfront and reuse it as needed in order to minimize unnecessary calculations.

sl0thentr0py

nice numbers 🎉

…ion-code-for-performance

Base automatically changed from txiao/tests/add-tests-for-thread-schedulers to master October 20, 2022 12:56

Zylphrex requested review from a team, sl0thentr0py and antonpirker October 20, 2022 15:42

Zylphrex added 3 commits October 20, 2022 15:51

fix type in tests

c182a4f

stringify thread names

3b316c2

Zylphrex force-pushed the txiao/perf/tune-the-sample-profile-generation-code-for-performance branch from fba25ec to 3b316c2 Compare October 20, 2022 19:51

sl0thentr0py approved these changes Oct 21, 2022

View reviewed changes

Merge branch 'master' into txiao/perf/tune-the-sample-profile-generat…

2251a58

…ion-code-for-performance

Zylphrex merged commit a8fdcb0 into master Oct 21, 2022

Zylphrex deleted the txiao/perf/tune-the-sample-profile-generation-code-for-performance branch October 21, 2022 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(profiling): Tune the sample profile generation code for performance #1694

perf(profiling): Tune the sample profile generation code for performance #1694

Zylphrex commented Oct 19, 2022

Zylphrex commented Oct 19, 2022 •

edited

antonpirker commented Oct 20, 2022

sl0thentr0py left a comment

perf(profiling): Tune the sample profile generation code for performance #1694

perf(profiling): Tune the sample profile generation code for performance #1694

Conversation

Zylphrex commented Oct 19, 2022

Zylphrex commented Oct 19, 2022 • edited

antonpirker commented Oct 20, 2022

sl0thentr0py left a comment

Choose a reason for hiding this comment

Zylphrex commented Oct 19, 2022 •

edited