New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster TraceBuffer for CRuby #1172
Conversation
b378675
to
94f0d38
Compare
94f0d38
to
7511b14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few small questions and nits, generally fine approving just want to be careful around the tag renaming stuff...this is great work, thanks @marcotc
lib/ddtrace/buffer.rb
Outdated
# * Pushed into a single CRubyTraceBuffer from 1000 threads. | ||
# The buffer can exceed its maximum size by no more than 4%. | ||
# | ||
# This implementation allocates 17-93% less memory and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but do we really want to add perf numbers in code comments? it's also a very broad range so i'm not sure how useful this comment is, maybe we can just say 'This implementation allocates significantly less memory and has modest speedup compared to Datadog::ThreadSafeBuffer'
or basically, something that won't be used against us in the feature 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to ensure I had supportive arguments documented, but hard numbers are sure to be variable across any execution environment.
I want to make sure to capture information that allows for future decision making regarding trade-offs being taken here.
I'm ambivalent in keeping or removing the numbers. What do you think @brettlangdon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the transparency, but this code comment could become stale pretty quickly.
I agree with Eric, going with a simpler comment to ensure people are considerate of performance impact whenever they modify the code is 👍🏻 and then we can think about how to expose this metric in a different way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you guys, I removed hard numbers from the comment.
There's a link to benchmarks than can be run at any time to validate if the stated performance gains still hold in this comment block already, so I'm thinking that covers the hard-numbers part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused on the @buffer_accepted_lengths
bit tbh but doesn't seem blocking to me, same for the code comment nit, so deferring to your best judgement on both. nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
tl;dr: reduction of 17-93% of allocated bytes, 5-15% faster.
Leveraging the fact that the native Ruby
Array
"is thread-safe in practice because CRuby runs threads one at a time and does not do context switching during the execution of C functions", this PR implements a version of theTraceBuffer
that does not use explicit locking.The buffer is one of the tracer hot-stops, being a sync point between the application critical path and the tracer's worker thread. All traces will eventually be pushed into the buffer, so improvements to it affect all instrumentations.
This version works correctly on CRuby, but will not maintain the same guarantees under other runtimes, like JRuby. For this reason, we kept the existing implementation, which utilizes explicit locking, for non-CRuby environments.
The benchmarks below (also included in the PR) use the default buffer size of 1000 traces.
When pushing over 1000 traces into the buffer, our fair eviction policy will take place. The 2000 traces benchmark covers this case. Increasing the number of traces pushed even more yielded the same performance results, so we stop at 2000.
The reduction in memory usage is the most notable improvement, with memory usage being constant for the new implementation:
While wall time has had a modest improvement: