Faster TraceBuffer for CRuby #1172

marcotc · 2020-09-11T21:23:27Z

tl;dr: reduction of 17-93% of allocated bytes, 5-15% faster.

Leveraging the fact that the native Ruby Array "is thread-safe in practice because CRuby runs threads one at a time and does not do context switching during the execution of C functions", this PR implements a version of the TraceBuffer that does not use explicit locking.

The buffer is one of the tracer hot-stops, being a sync point between the application critical path and the tracer's worker thread. All traces will eventually be pushed into the buffer, so improvements to it affect all instrumentations.

This version works correctly on CRuby, but will not maintain the same guarantees under other runtimes, like JRuby. For this reason, we kept the existing implementation, which utilizes explicit locking, for non-CRuby environments.

The benchmarks below (also included in the PR) use the default buffer size of 1000 traces.
When pushing over 1000 traces into the buffer, our fair eviction policy will take place. The 2000 traces benchmark covers this case. Increasing the number of traces pushed even more yielded the same performance results, so we stop at 2000.

The reduction in memory usage is the most notable improvement, with memory usage being constant for the new implementation:

Before(ThreadSafeBuffer) [bytes allocated (objects created)]
    10 traces:   91840.00 (   1900.00)
   100 traces:  177376.00 (   1900.00)
  1000 traces: 1244200.00 (   1900.00)
  2000 traces: 1244200.00 (   1900.00)

After(CRubyTraceBuffer)  [bytes allocated (objects created)]
    10 traces:   76000.00 (   1900.00)
   100 traces:   76000.00 (   1900.00)
  1000 traces:   76000.00 (   1900.00)
  2000 traces:   76000.00 (   1900.00)

Comparison (% reduction, increase negative)
    10 traces:      17.25 (      0.00)
   100 traces:      57.15 (      0.00)
  1000 traces:      93.89 (      0.00)
  2000 traces:      93.89 (      0.00)

While wall time has had a modest improvement:

Before(ThreadSafeBuffer) [operations/sec]
    10 spans:   74676.66
   100 spans:   11465.74
  1000 spans:    1208.54
  2000 spans:     486.27

After(CRubyTraceBuffer)  [operations/sec]
    10 spans:   82571.16
   100 spans:   13084.07
  1000 spans:    1397.91
  2000 spans:     512.44

Comparison (% faster; slower if negative)
    10 spans:      10.57
   100 spans:      14.11
  1000 spans:      15.67
  2000 spans:       5.38

Process finished with exit code 0

ericmustin

I have a few small questions and nits, generally fine approving just want to be careful around the tag renaming stuff...this is great work, thanks @marcotc

lib/ddtrace/buffer.rb

ericmustin · 2020-09-17T16:53:36Z

lib/ddtrace/buffer.rb

+  # * Pushed into a single CRubyTraceBuffer from 1000 threads.
+  # The buffer can exceed its maximum size by no more than 4%.
+  #
+  # This implementation allocates 17-93% less memory and


Not a big deal but do we really want to add perf numbers in code comments? it's also a very broad range so i'm not sure how useful this comment is, maybe we can just say 'This implementation allocates significantly less memory and has modest speedup compared to Datadog::ThreadSafeBuffer' or basically, something that won't be used against us in the feature 😅

I was trying to ensure I had supportive arguments documented, but hard numbers are sure to be variable across any execution environment.

I want to make sure to capture information that allows for future decision making regarding trade-offs being taken here.

I'm ambivalent in keeping or removing the numbers. What do you think @brettlangdon?

I love the transparency, but this code comment could become stale pretty quickly.

I agree with Eric, going with a simpler comment to ensure people are considerate of performance impact whenever they modify the code is 👍🏻 and then we can think about how to expose this metric in a different way.

Thank you guys, I removed hard numbers from the comment.
There's a link to benchmarks than can be run at any time to validate if the stated performance gains still hold in this comment block already, so I'm thinking that covers the hard-numbers part.

lib/ddtrace/buffer.rb

lib/ddtrace/ext/runtime.rb

ericmustin

I'm still confused on the @buffer_accepted_lengths bit tbh but doesn't seem blocking to me, same for the code comment nit, so deferring to your best judgement on both. nice work!

The base branch was changed.

ericmustin

🚀

marcotc added the performance Involves performance (e.g. CPU, memory, etc) label Sep 11, 2020

marcotc self-assigned this Sep 11, 2020

marcotc force-pushed the perf/buffer-perf branch 3 times, most recently from b378675 to 94f0d38 Compare September 14, 2020 19:20

Faster TraceBuffer for CRuby

7511b14

marcotc force-pushed the perf/buffer-perf branch from 94f0d38 to 7511b14 Compare September 15, 2020 21:19

marcotc marked this pull request as ready for review September 15, 2020 21:24

marcotc requested a review from a team September 15, 2020 21:24

ericmustin reviewed Sep 17, 2020

View reviewed changes

marcotc requested a review from ericmustin September 18, 2020 19:26

ericmustin previously approved these changes Sep 18, 2020

View reviewed changes

Merge branch 'master' into perf/buffer-perf

3b8c73b

marcotc changed the base branch from perf/transport-memory-improvements to master September 21, 2020 18:34

Remove hard numbers for posterity

3db35a4

marcotc requested a review from ericmustin September 21, 2020 18:54

ericmustin approved these changes Sep 21, 2020

View reviewed changes

marcotc merged commit da5e5bc into master Sep 21, 2020

marcotc added this to the 0.41.0 milestone Sep 30, 2020

ivoanjo deleted the perf/buffer-perf branch July 16, 2021 09:13

ivoanjo mentioned this pull request Sep 29, 2021

CRubyTraceBuffer can exceed its max size and that leads to flaky tests #1704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster TraceBuffer for CRuby #1172

Faster TraceBuffer for CRuby #1172

marcotc commented Sep 11, 2020 •

edited

ericmustin left a comment

ericmustin Sep 17, 2020

marcotc Sep 18, 2020

brettlangdon Sep 21, 2020

marcotc Sep 21, 2020

ericmustin left a comment

ericmustin left a comment

Faster TraceBuffer for CRuby #1172

Faster TraceBuffer for CRuby #1172

Conversation

marcotc commented Sep 11, 2020 • edited

ericmustin left a comment

Choose a reason for hiding this comment

ericmustin Sep 17, 2020

Choose a reason for hiding this comment

marcotc Sep 18, 2020

Choose a reason for hiding this comment

brettlangdon Sep 21, 2020

Choose a reason for hiding this comment

marcotc Sep 21, 2020

Choose a reason for hiding this comment

ericmustin left a comment

Choose a reason for hiding this comment

ericmustin left a comment

Choose a reason for hiding this comment

marcotc commented Sep 11, 2020 •

edited