Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster TraceBuffer for CRuby #1172

Merged
merged 3 commits into from Sep 21, 2020
Merged

Faster TraceBuffer for CRuby #1172

merged 3 commits into from Sep 21, 2020

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Sep 11, 2020

tl;dr: reduction of 17-93% of allocated bytes, 5-15% faster.

Leveraging the fact that the native Ruby Array "is thread-safe in practice because CRuby runs threads one at a time and does not do context switching during the execution of C functions", this PR implements a version of the TraceBuffer that does not use explicit locking.

The buffer is one of the tracer hot-stops, being a sync point between the application critical path and the tracer's worker thread. All traces will eventually be pushed into the buffer, so improvements to it affect all instrumentations.

This version works correctly on CRuby, but will not maintain the same guarantees under other runtimes, like JRuby. For this reason, we kept the existing implementation, which utilizes explicit locking, for non-CRuby environments.

The benchmarks below (also included in the PR) use the default buffer size of 1000 traces.
When pushing over 1000 traces into the buffer, our fair eviction policy will take place. The 2000 traces benchmark covers this case. Increasing the number of traces pushed even more yielded the same performance results, so we stop at 2000.

The reduction in memory usage is the most notable improvement, with memory usage being constant for the new implementation:

Before(ThreadSafeBuffer) [bytes allocated (objects created)]
    10 traces:   91840.00 (   1900.00)
   100 traces:  177376.00 (   1900.00)
  1000 traces: 1244200.00 (   1900.00)
  2000 traces: 1244200.00 (   1900.00)

After(CRubyTraceBuffer)  [bytes allocated (objects created)]
    10 traces:   76000.00 (   1900.00)
   100 traces:   76000.00 (   1900.00)
  1000 traces:   76000.00 (   1900.00)
  2000 traces:   76000.00 (   1900.00)

Comparison (% reduction, increase negative)
    10 traces:      17.25 (      0.00)
   100 traces:      57.15 (      0.00)
  1000 traces:      93.89 (      0.00)
  2000 traces:      93.89 (      0.00)

While wall time has had a modest improvement:

Before(ThreadSafeBuffer) [operations/sec]
    10 spans:   74676.66
   100 spans:   11465.74
  1000 spans:    1208.54
  2000 spans:     486.27

After(CRubyTraceBuffer)  [operations/sec]
    10 spans:   82571.16
   100 spans:   13084.07
  1000 spans:    1397.91
  2000 spans:     512.44

Comparison (% faster; slower if negative)
    10 spans:      10.57
   100 spans:      14.11
  1000 spans:      15.67
  2000 spans:       5.38

Process finished with exit code 0

@marcotc marcotc added the performance Involves performance (e.g. CPU, memory, etc) label Sep 11, 2020
@marcotc marcotc self-assigned this Sep 11, 2020
@marcotc marcotc force-pushed the perf/buffer-perf branch 3 times, most recently from b378675 to 94f0d38 Compare September 14, 2020 19:20
@marcotc marcotc marked this pull request as ready for review September 15, 2020 21:24
@marcotc marcotc requested a review from a team September 15, 2020 21:24
Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small questions and nits, generally fine approving just want to be careful around the tag renaming stuff...this is great work, thanks @marcotc

lib/ddtrace/buffer.rb Show resolved Hide resolved
lib/ddtrace/buffer.rb Show resolved Hide resolved
# * Pushed into a single CRubyTraceBuffer from 1000 threads.
# The buffer can exceed its maximum size by no more than 4%.
#
# This implementation allocates 17-93% less memory and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but do we really want to add perf numbers in code comments? it's also a very broad range so i'm not sure how useful this comment is, maybe we can just say 'This implementation allocates significantly less memory and has modest speedup compared to Datadog::ThreadSafeBuffer' or basically, something that won't be used against us in the feature 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to ensure I had supportive arguments documented, but hard numbers are sure to be variable across any execution environment.

I want to make sure to capture information that allows for future decision making regarding trade-offs being taken here.

I'm ambivalent in keeping or removing the numbers. What do you think @brettlangdon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the transparency, but this code comment could become stale pretty quickly.

I agree with Eric, going with a simpler comment to ensure people are considerate of performance impact whenever they modify the code is 👍🏻 and then we can think about how to expose this metric in a different way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you guys, I removed hard numbers from the comment.
There's a link to benchmarks than can be run at any time to validate if the stated performance gains still hold in this comment block already, so I'm thinking that covers the hard-numbers part.

lib/ddtrace/buffer.rb Show resolved Hide resolved
lib/ddtrace/ext/runtime.rb Show resolved Hide resolved
ericmustin
ericmustin previously approved these changes Sep 18, 2020
Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused on the @buffer_accepted_lengths bit tbh but doesn't seem blocking to me, same for the code comment nit, so deferring to your best judgement on both. nice work!

@marcotc marcotc changed the base branch from perf/transport-memory-improvements to master September 21, 2020 18:34
@marcotc marcotc dismissed ericmustin’s stale review September 21, 2020 18:34

The base branch was changed.

Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@marcotc marcotc merged commit da5e5bc into master Sep 21, 2020
@marcotc marcotc added this to the 0.41.0 milestone Sep 30, 2020
@ivoanjo ivoanjo deleted the perf/buffer-perf branch July 16, 2021 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Involves performance (e.g. CPU, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants