Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of the HTTP transport #1165

Merged
merged 9 commits into from Sep 21, 2020

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Sep 3, 2020

tl;dr: 21-50% memory reduction, 19-30% faster HTTP transport.

This PR reduces memory of our default HTTP transport. As result of these changes, performance has been increased as well.

The largest gains were due:

  • Direct MessagePack serialization of span.rb: before we were creating an intermediate Hash object in order to serialize spans; now we directly interface with MessagePack objects.
  • Net::HTTP.start is inefficient when processing its named arguments: we now use a different interface to configure these options, which avoids the expensive processing step.

One area of with large possible improvements is to use a different HTTP adapter. The native Net::HTTP is great because it's always available, but it has showed up many times in the memory profiler, due to many strings being created during the processing of HTTP requests and responses.

Results

spec/ddtrace/benchmark/transport_benchmark_spec.rb includes the benchmarks used to profile the tracer and produce the numbers reported here.

Memory

Before [bytes allocated (objects created)]
   1  span:   3601720   (53508)
  10 spans:   5503620   (63408)
 100 spans:  24437620  (162408)
1000 spans: 214035300 (1152590)

After [bytes allocated (objects created)]
   1  span:   2838120   (47208)
  10 spans:   3782420   (54408)
 100 spans:  13143620  (126408)
1000 spans: 106960100  (846590)

Difference (% reduction)
   1  span: 21% (11%)
  10 spans: 31% (14%)
 100 spans: 46% (22%)
1000 spans: 50% (26%)

CPU

Before [operations/sec]
     1 spans:    1976.05
    10 spans:    1820.21
   100 spans:     918.89
  1000 spans:     147.81

After [operations/sec]
     1 spans:    2576.38
    10 spans:    2316.30
   100 spans:    1177.31
  1000 spans:     175.92

Comparison (% faster; slower if negative)
     1 spans:      30.38
    10 spans:      27.25
   100 spans:      28.12
  1000 spans:      19.02

@marcotc marcotc added the performance Involves performance (e.g. CPU, memory, etc) label Sep 3, 2020
@marcotc marcotc requested a review from a team September 3, 2020 22:43
@marcotc marcotc self-assigned this Sep 3, 2020
@marcotc marcotc force-pushed the perf/transport-memory-improvements branch from 1300ffa to 468b9d1 Compare September 3, 2020 22:43
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we separate our the changes of direct msgpack encoding from the transport changes?

Can we add benchmark suite for the msgpack changes on it's own? (e.g. encoding a trace vs using the transport)

spec/ddtrace/benchmark/transport_benchmark_spec.rb Outdated Show resolved Hide resolved
packer.write_map_header(11) # Set header with how many elements in the map
end

packer.write(:span_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these symbols end up being re-encoded every time? If so can it be memoized? (Saying this knowing nothing about the Ruby msgpack library, or Ruby itself 😆)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are converted to string by MessagePack, but Ruby has an internalized string for each symbol.
Although I think it's worth trying it with strings directly, as there's a chance Ruby could create a copy of that internalized string each time we ask for it.

I'll report on results here, thanks for the heads up!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did improve the results, thank you @Kyle-Verhoog! 🎉
Memory usage stayed the same, as symbols already have an internal string representation, but performance was improved.

@@ -269,36 +271,36 @@ def to_msgpack(packer = nil)
if !@start_time.nil? && !@end_time.nil?
packer.write_map_header(13) # Set header with how many elements in the map

packer.write(:start)
packer.write('start')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding a comment on why we use strings instead of symbols?

any benefits to making these constants/freezing them outside the scope of this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any benefits to making these constants/freezing them outside the scope of this method?

Instead of doing that, given there are so many strings, I added # frozen_string_literal: true to the top of the file, which freezes all strings in this. I did check all strings declared in this file and they are all safe to freeze.

I also benchmark only the # frozen_string_literal: true change, to see if the other strings frozen in the file would make a change to our numbers, but they didn't, so the performance improvement does come from the change from symbol to string only.

worth adding a comment on why we use strings instead of symbols?

I'll add a comment for that, good call!

@brettlangdon
Copy link
Member

I thought I had a comment somewhere, but don't see it. Do we have benchmarks specifically for the encoding piece?

e.g. to_msgpack of a single span, or a trace of varying sizes, outside the context of the transport?

@marcotc
Copy link
Member Author

marcotc commented Sep 4, 2020

e.g. to_msgpack of a single span, or a trace of varying sizes, outside the context of the transport?

We could add these, and ultimately also test our JSON serialization.
Where we stand today, the transport benchmark in PR is pretty much 50% serialization, 50% HTTP client.

I think we have it pretty well covered at this moment, but we can add this more granular benchmark as well.

@brettlangdon
Copy link
Member

More granular might make sense as we can use it to optimize that specific piece.

e.g. if we want to improve to_msgpack also testing the http piece at the same time might make it noisy.

@marcotc
Copy link
Member Author

marcotc commented Sep 4, 2020

@brettlangdon cool, I scheduled a separate follow up task to benchmark specifically the serialization.

@brettlangdon
Copy link
Member

@marcotc that sounds good to me, thanks for tracking it!

@brettlangdon
Copy link
Member

Should we separate the span encoding changes from the transport changes?

e.g. add to_msgpack and the microbenchmarks there in a separate PR?

@marcotc
Copy link
Member Author

marcotc commented Sep 4, 2020

@brettlangdon I don't think we need to separate it. At the end of day, encoding is an integral part of the "please send these spans to Datadog" part of our tracer, which is what we are benchmarking here.

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What versions of Ruby have we benchmarked these changes with?

I see there is a case to use filter_map on >= 2.7, what kind of performance difference is there from this in 2.6 vs 2.7 (same for the other pieces)?

packer.write_map_header(13) # Set header with how many elements in the map

packer.write('start')
packer.write((@start_time.to_f * 1e9).to_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is written in two places now, read only property?

#start_time_ns + #duration_ns ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, updated it.
I also changed the operation to perform fewer operations to get the value as nanoseconds.
The move to a separate method, combined with the arithmetic improvements increased performance bit a very tiny bit.

@marcotc
Copy link
Member Author

marcotc commented Sep 10, 2020

@brettlangdon Results are for the fastest version, 2.7.
The difference between 2.7 and 2.6, for example, is very small. The change to filter_map reduced memory usage bit a small amount and increased performance by a very small margin. Considering the whole PR, this is sub-percent improvement.

ericmustin
ericmustin previously approved these changes Sep 17, 2020
Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, the only real speed i could think of would be tinkering with the http client itself, but probably not worth the risks involved of swapping in a 3rd party library, and i guess would also increase memory usage of the tracer on startup

lib/ddtrace/span.rb Outdated Show resolved Hide resolved
# DEV Initializing +Net::HTTP+ directly help us avoid expensive
# options processing done in +Net::HTTP.start+:
# https://github.com/ruby/ruby/blob/b2d96abb42abbe2e01f010ffc9ac51f0f9a50002/lib/net/http.rb#L614-L618
req = ::Net::HTTP.new(hostname, port, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is kinda out there but have we considered using a 3rd party http library? Might not be worth the pain but I believe some other vendors use http.rb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a follow up task to investigate this 👍


# DEV: We use strings as keys here, instead of symbols, as
# DEV: MessagePack will ultimately convert them to strings.
# DEV: By providing strings directly, we skip this indirection operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be faster than defining a bunch of constants, ie

SPAN_ID = 'span_id'.freeze
...
...
...
packer.write(SPAN_ID)

etc etc ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvmd, i see discussion here: #1165 (comment)

encoded_traces = traces.map { |t| encode_one(t) }.reject(&:nil?)
encoded_traces = if traces.respond_to?(:filter_map)
# DEV Supported since Ruby 2.7, saves an intermediate object creation
traces.filter_map { |t| encode_one(t) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@marcotc
Copy link
Member Author

marcotc commented Sep 18, 2020

After rebasing changes to use monotonic clock, no visible performance impact can be measured.

ericmustin
ericmustin previously approved these changes Sep 18, 2020
Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@marcotc
Copy link
Member Author

marcotc commented Sep 18, 2020

@ericmustin I forgot I based #1178 on top of this branch (because of some shared fixtures that it easier to write future benchmarks). Would you mind ✅ this again when you have some time?

@marcotc marcotc merged commit 5bd0dba into master Sep 21, 2020
@ericmustin ericmustin added this to the 0.41.0 milestone Sep 30, 2020
@ivoanjo ivoanjo deleted the perf/transport-memory-improvements branch July 16, 2021 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Involves performance (e.g. CPU, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants