Reduce memory usage of the HTTP transport #1165

marcotc · 2020-09-03T22:43:43Z

tl;dr: 21-50% memory reduction, 19-30% faster HTTP transport.

This PR reduces memory of our default HTTP transport. As result of these changes, performance has been increased as well.

The largest gains were due:

Direct MessagePack serialization of span.rb: before we were creating an intermediate Hash object in order to serialize spans; now we directly interface with MessagePack objects.
Net::HTTP.start is inefficient when processing its named arguments: we now use a different interface to configure these options, which avoids the expensive processing step.

One area of with large possible improvements is to use a different HTTP adapter. The native Net::HTTP is great because it's always available, but it has showed up many times in the memory profiler, due to many strings being created during the processing of HTTP requests and responses.

Results

spec/ddtrace/benchmark/transport_benchmark_spec.rb includes the benchmarks used to profile the tracer and produce the numbers reported here.

Memory

Before [bytes allocated (objects created)]
   1  span:   3601720   (53508)
  10 spans:   5503620   (63408)
 100 spans:  24437620  (162408)
1000 spans: 214035300 (1152590)

After [bytes allocated (objects created)]
   1  span:   2838120   (47208)
  10 spans:   3782420   (54408)
 100 spans:  13143620  (126408)
1000 spans: 106960100  (846590)

Difference (% reduction)
   1  span: 21% (11%)
  10 spans: 31% (14%)
 100 spans: 46% (22%)
1000 spans: 50% (26%)

CPU

Before [operations/sec]
     1 spans:    1976.05
    10 spans:    1820.21
   100 spans:     918.89
  1000 spans:     147.81

After [operations/sec]
     1 spans:    2576.38
    10 spans:    2316.30
   100 spans:    1177.31
  1000 spans:     175.92

Comparison (% faster; slower if negative)
     1 spans:      30.38
    10 spans:      27.25
   100 spans:      28.12
  1000 spans:      19.02

lib/ddtrace/span.rb

brettlangdon

Should we separate our the changes of direct msgpack encoding from the transport changes?

Can we add benchmark suite for the msgpack changes on it's own? (e.g. encoding a trace vs using the transport)

spec/ddtrace/benchmark/transport_benchmark_spec.rb

Kyle-Verhoog · 2020-09-04T00:56:36Z

lib/ddtrace/span.rb

+        packer.write_map_header(11) # Set header with how many elements in the map
+      end
+
+      packer.write(:span_id)


Do these symbols end up being re-encoded every time? If so can it be memoized? (Saying this knowing nothing about the Ruby msgpack library, or Ruby itself 😆)

They are converted to string by MessagePack, but Ruby has an internalized string for each symbol.
Although I think it's worth trying it with strings directly, as there's a chance Ruby could create a copy of that internalized string each time we ask for it.

I'll report on results here, thanks for the heads up!

It did improve the results, thank you @Kyle-Verhoog! 🎉
Memory usage stayed the same, as symbols already have an internal string representation, but performance was improved.

brettlangdon · 2020-09-04T19:25:00Z

lib/ddtrace/span.rb

@@ -269,36 +271,36 @@ def to_msgpack(packer = nil)
      if !@start_time.nil? && !@end_time.nil?
        packer.write_map_header(13) # Set header with how many elements in the map

-        packer.write(:start)
+        packer.write('start')


worth adding a comment on why we use strings instead of symbols?

any benefits to making these constants/freezing them outside the scope of this method?

any benefits to making these constants/freezing them outside the scope of this method?

Instead of doing that, given there are so many strings, I added # frozen_string_literal: true to the top of the file, which freezes all strings in this. I did check all strings declared in this file and they are all safe to freeze.

I also benchmark only the # frozen_string_literal: true change, to see if the other strings frozen in the file would make a change to our numbers, but they didn't, so the performance improvement does come from the change from symbol to string only.

worth adding a comment on why we use strings instead of symbols?

I'll add a comment for that, good call!

brettlangdon · 2020-09-04T19:26:05Z

I thought I had a comment somewhere, but don't see it. Do we have benchmarks specifically for the encoding piece?

e.g. to_msgpack of a single span, or a trace of varying sizes, outside the context of the transport?

marcotc · 2020-09-04T20:17:46Z

e.g. to_msgpack of a single span, or a trace of varying sizes, outside the context of the transport?

We could add these, and ultimately also test our JSON serialization.
Where we stand today, the transport benchmark in PR is pretty much 50% serialization, 50% HTTP client.

I think we have it pretty well covered at this moment, but we can add this more granular benchmark as well.

brettlangdon · 2020-09-04T20:33:04Z

More granular might make sense as we can use it to optimize that specific piece.

e.g. if we want to improve to_msgpack also testing the http piece at the same time might make it noisy.

marcotc · 2020-09-04T20:39:25Z

@brettlangdon cool, I scheduled a separate follow up task to benchmark specifically the serialization.

brettlangdon · 2020-09-04T20:41:04Z

@marcotc that sounds good to me, thanks for tracking it!

brettlangdon · 2020-09-04T20:42:29Z

Should we separate the span encoding changes from the transport changes?

e.g. add to_msgpack and the microbenchmarks there in a separate PR?

marcotc · 2020-09-04T20:46:10Z

@brettlangdon I don't think we need to separate it. At the end of day, encoding is an integral part of the "please send these spans to Datadog" part of our tracer, which is what we are benchmarking here.

brettlangdon

What versions of Ruby have we benchmarked these changes with?

I see there is a case to use filter_map on >= 2.7, what kind of performance difference is there from this in 2.6 vs 2.7 (same for the other pieces)?

brettlangdon · 2020-09-04T20:44:29Z

lib/ddtrace/span.rb

+        packer.write_map_header(13) # Set header with how many elements in the map
+
+        packer.write('start')
+        packer.write((@start_time.to_f * 1e9).to_i)


Since this is written in two places now, read only property?

#start_time_ns + #duration_ns ?

Cool, updated it.
I also changed the operation to perform fewer operations to get the value as nanoseconds.
The move to a separate method, combined with the arithmetic improvements increased performance bit a very tiny bit.

marcotc · 2020-09-10T20:29:18Z

@brettlangdon Results are for the fastest version, 2.7.
The difference between 2.7 and 2.6, for example, is very small. The change to filter_map reduced memory usage bit a small amount and increased performance by a very small margin. Considering the whole PR, this is sub-percent improvement.

ericmustin

Lgtm, the only real speed i could think of would be tinkering with the http client itself, but probably not worth the risks involved of swapping in a 3rd party library, and i guess would also increase memory usage of the tracer on startup

lib/ddtrace/span.rb

ericmustin · 2020-09-17T15:33:25Z

lib/ddtrace/transport/http/adapters/net.rb

+            # DEV Initializing +Net::HTTP+ directly help us avoid expensive
+            # options processing done in +Net::HTTP.start+:
+            # https://github.com/ruby/ruby/blob/b2d96abb42abbe2e01f010ffc9ac51f0f9a50002/lib/net/http.rb#L614-L618
+            req = ::Net::HTTP.new(hostname, port, nil)


I know this is kinda out there but have we considered using a 3rd party http library? Might not be worth the pain but I believe some other vendors use http.rb

We have a follow up task to investigate this 👍

ericmustin · 2020-09-17T15:35:01Z

lib/ddtrace/span.rb

+
+      # DEV: We use strings as keys here, instead of symbols, as
+      # DEV: MessagePack will ultimately convert them to strings.
+      # DEV: By providing strings directly, we skip this indirection operation.


This would be faster than defining a bunch of constants, ie

SPAN_ID = 'span_id'.freeze ... ... ... packer.write(SPAN_ID)

etc etc ?

nvmd, i see discussion here: #1165 (comment)

ericmustin · 2020-09-17T15:36:47Z

lib/ddtrace/transport/traces.rb

-          encoded_traces = traces.map { |t| encode_one(t) }.reject(&:nil?)
+          encoded_traces = if traces.respond_to?(:filter_map)
+                             # DEV Supported since Ruby 2.7, saves an intermediate object creation
+                             traces.filter_map { |t| encode_one(t) }


marcotc · 2020-09-18T19:17:16Z

After rebasing changes to use monotonic clock, no visible performance impact can be measured.

ericmustin

🚀

marcotc · 2020-09-18T21:18:39Z

@ericmustin I forgot I based #1178 on top of this branch (because of some shared fixtures that it easier to write future benchmarks). Would you mind ✅ this again when you have some time?

Reduce memory usage of the HTTP transport

468b9d1

marcotc added the performance Involves performance (e.g. CPU, memory, etc) label Sep 3, 2020

marcotc requested a review from a team September 3, 2020 22:43

marcotc self-assigned this Sep 3, 2020

marcotc force-pushed the perf/transport-memory-improvements branch from 1300ffa to 468b9d1 Compare September 3, 2020 22:43

brettlangdon reviewed Sep 3, 2020

View reviewed changes

lib/ddtrace/span.rb Show resolved Hide resolved

brettlangdon reviewed Sep 3, 2020

View reviewed changes

spec/ddtrace/benchmark/transport_benchmark_spec.rb Outdated Show resolved Hide resolved

Kyle-Verhoog reviewed Sep 4, 2020

View reviewed changes

Use strings directly for MessagePack keys

f7a83fb

brettlangdon reviewed Sep 4, 2020

View reviewed changes

Add comment regarding msgpack keys

4c8d512

Fork for parallelism support

a9c6677

marcotc requested review from brettlangdon and Kyle-Verhoog September 4, 2020 20:32

Merge branch 'master' into perf/transport-memory-improvements

b803a84

brettlangdon reviewed Sep 4, 2020

View reviewed changes

Improve time reporting

9c56859

marcotc requested review from brettlangdon and ericmustin September 10, 2020 20:29

Extract minimal agent for reuse

b1877c1

ericmustin previously approved these changes Sep 17, 2020

View reviewed changes

Merge branch 'master' into perf/transport-memory-improvements

bce9d96

marcotc dismissed ericmustin’s stale review via bce9d96 September 18, 2020 19:16

marcotc requested a review from ericmustin September 18, 2020 19:17

ericmustin previously approved these changes Sep 18, 2020

View reviewed changes

Add end-to-end benchmarking (#1178)

8933df6

marcotc dismissed ericmustin’s stale review via 8933df6 September 18, 2020 21:10

ericmustin approved these changes Sep 18, 2020

View reviewed changes

marcotc merged commit 5bd0dba into master Sep 21, 2020

ericmustin added this to the 0.41.0 milestone Sep 30, 2020

ivoanjo deleted the perf/transport-memory-improvements branch July 16, 2021 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of the HTTP transport #1165

Reduce memory usage of the HTTP transport #1165

marcotc commented Sep 3, 2020 •

edited

brettlangdon left a comment

Kyle-Verhoog Sep 4, 2020

marcotc Sep 4, 2020

marcotc Sep 4, 2020

brettlangdon Sep 4, 2020

marcotc Sep 4, 2020

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon left a comment

brettlangdon Sep 4, 2020

marcotc Sep 10, 2020

marcotc commented Sep 10, 2020

ericmustin left a comment

ericmustin Sep 17, 2020

marcotc Sep 18, 2020

ericmustin Sep 17, 2020

ericmustin Sep 17, 2020

ericmustin Sep 17, 2020

marcotc commented Sep 18, 2020

ericmustin left a comment

marcotc commented Sep 18, 2020

Reduce memory usage of the HTTP transport #1165

Reduce memory usage of the HTTP transport #1165

Conversation

marcotc commented Sep 3, 2020 • edited

Results

Memory

CPU

brettlangdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

brettlangdon commented Sep 4, 2020

marcotc commented Sep 4, 2020

brettlangdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc commented Sep 10, 2020

ericmustin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc commented Sep 18, 2020

ericmustin left a comment

Choose a reason for hiding this comment

marcotc commented Sep 18, 2020

marcotc commented Sep 3, 2020 •

edited