Ethon & Typhoeus tracing support #778

al-kudryavtsev · 2019-07-03T21:50:19Z

Goal

Support distributed tracing when making requests through Ethon library (including Typhoeus requests which use Ethon)
The related issue is #527

Implementation details

Links to the libraries:
https://github.com/typhoeus/ethon
https://github.com/typhoeus/typhoeus

The implementation was initially based on rest_client integration, but I had to change the approach because Ethon is a wrapper around libcurl which can execute many requests concurrently. I had to patch quite a few methods on Easy to support more use cases. There are different examples of the requests in the spec (made through Ethon's Easy & Multi and through Typhoeus's Request & Hydra).

This code has been in production for a few weeks, didn't see any issues. However we don't use concurrent requests (done through Multi or Hydra) so relying on unit tests for other use cases. Ethon also supports non-http requests, this PR does not address these use cases.

Here is a screenshot of distributed tracing achieved through new integration

al-kudryavtsev · 2019-07-09T14:19:49Z

Fixed ruby 1.9 support & rebased to get rid of failed mongo tests

delner · 2019-07-12T15:15:44Z

@al-kudryavtsev The pull request is looking quite filled out! Is this ready for review?

Also, since I'm not familiar with Typhoeus/Ethon, could you explain the relationship between the two? What are the responsibilities of each library? Would a user ever want to activate Typhoeus tracing without Ethon or vice versa?

I saw Typhoeus added as a dependency, and Ethon getting some instrumentation, but I didn't see any instrumentation explicitly defined for Typhoeus, and it wasn't clear what this means for Typhoeus support.

al-kudryavtsev · 2019-07-12T18:10:10Z

@delner Yes, the PR is ready for review, thanks!

Ethon is a libcurl wrapper which provides two main objects, Easy and Multi, which can be used to make requests (not only HTTP). Typhoeus is build on top of Ethon and is designed for HTTP requests. It adds extra features such as memoization / caching, and in general provides more high-level interface.

Initially I wanted to support Typhoeus because that's what we use for HTTP requests. The problem is that with extra features it is actually harder to implement proper & reliable instrumentation, especially with multiple parallel asynchronous requests. At the same time, Ethon's Easy object is a good candidate for tracking asynchronous requests because it is used when request starts and when response is ready.

What I did instead is instrumenting Ethon. This way HTTP requests coming through Typhoeus are fully supported and other use cases of Ethon are also supported (or support can be added relatively easy). I added Typhoeus as a dependency in order to unit test the instrumentation with Typhoeus primitives.

So, to answer your question, if the user wants to instrument HTTP requests made with Typhoeus, they will need to enable Ethon tracing. If they need to trace some other features of Typhoeus (like caching of responses), they will need a different instrumentation.

Also I'm not an author of any of these libraries, so it is possible I'm missing something and there is an easier way. There are hooks in both libraries, however it doesn't seem to be a reliable approach to tracing.

Please let me know what you think.

delner · 2019-07-15T15:43:22Z

@al-kudryavtsev This is a super helpful description, thank you!

Sounds like it'll be good then to have this Ethon instrumentation either way, only major question to me outstanding is: "does this constitute 'Typhoeus' support?" I don't think I have this answer; I asked the folks on #527 what they think.

If it does, then we somehow have to make it obvious/intuitive to users who want Typhoeus traces to activate the Ethon instrumentation. This might require us to add something like c.use :typhoeus for that purpose, but we'll cross that bridge later.

If it doesn't, I will happily accept this PR as Ethon instrumentation on its own merits, but we won't be able to close #527, and as a community we'll need to find a better way to give the proper feature coverage to Typhoeus at a later date.

In any case, I'll give this a review a bit later!

che-burashco · 2019-07-23T13:23:42Z

@delner Did you have a chance to look at the PR by any chance?

delner

Overall this is looking very good; the tests are thorough, and pretty much all the boxes on features appear to be checked. :)

My feedback primarily is about developing my understanding of how the instrumentation works in the Ethon framework, and if there's any way to simplify it (e.g. avoiding use of state/instance variables, etc), and a little bit about reorganizing the tests into smaller groupings.

I think once I can get my understanding to "click" at bit more and we make a few tweaks, we should be able to get this approved.

lib/ddtrace/contrib/ethon/easy_patch.rb

delner · 2019-07-23T14:57:47Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+        module InstanceMethods
+          include InstanceMethodsCompatibility unless Gem::Version.new(RUBY_VERSION) >= Gem::Version.new('2.0.0')
+
+          def http_request(url, action_name, options = {})


Can you walk me through the flow of a request here a bit? Order the of the functions being called, sync vs async requests, etc.

I see a number of methods being patched here, each collecting and storing some information as instance variables. I'd like to understand how the state is built. Is this information not available on perform without first collecting in between all these methods?

For sure! I'll start with sync requests.

To make a sync request, the client uses Easy object. First it is created, and as a part of initialization it calls the set_attributes method which will in turn call headers=, but only if headers are provided. The client can also set the headers directly using headers= method. The problem is that as soon as headers are set they are converted to FFI pointer, that's why I store their original version. And we don't have a span yet to inject the tracing headers.

http_request method is used as a helper to populate the easy with necessary information for HTTP request. In particular, HTTP method is passed (for example, easy.http_request("www.example.com", :post, { params: { a: 1 }, body: { b: 2 } })). The problem is that this data is not preserved on easy in a way that can be used to recover the HTTP method easily. The factories set various low-level attributes (https://github.com/typhoeus/ethon/tree/master/lib/ethon/easy/http), and libcurl itself figures out HTTP method based on all these attributes. So I figured it is easier to just store the method instead of trying to recover it from libcurl attributes.

The sync way to execute an easy is calling perform. I patch the perform to create the span and to inject the headers if needed (that's why I stored the original headers on the object, to avoid reading them back from FFI). perform uses libcurl's easy_perform method and then calls the complete method on easy that I patch to finish the span.

Async requests are executed using Multi. Individual requests are represented by the same Easy objects, however the execution flow is different. Easy instance is added to the Multi using add method. I consider this to be the beginning of the request execution, that's why the span is created there. There is no easy way to know if it is the beginning of the request or if multi is still on hold because multi execution can be in progress in the other thread. I follow here the way Typhoeus uses Multi : it adds the easy objects to Multi right before executing the perform. It also has hooks for before-request event which are executed in the add call.

The execution of multi happens when the perform method is called. This method uses libcurl's multi_perform and calls complete on easy objects which finished executing.

Okay this is pretty interesting, and exactly the kind of explanation I was looking for. (Thank you!)

I can see why you'd want to store this info before it gets turned into a format that's hard to read from; as long these objects are 1-1 with requests, then this seems okay. Just have to be careful in any scenario in which an Easy object is re-used or re-ran? Not sure if that happens, but some food for thought.

If Multi uses multiple Easy objects, would it make sense to add an additional parent span to the multi operation? Thinking about the case if users want to see the batch as an operation, in addition to its constituent parts.

I can see why you'd want to store this info before it gets turned into a format that's hard to read from; as long these objects are 1-1 with requests, then this seems okay. Just have to be careful in any scenario in which an Easy object is re-used or re-ran? Not sure if that happens, but some food for thought.

That's a good point. I added extra safety net by patching the reset method which is used before re-using easy instance. This way I can be sure that HTTP method or headers are not mistakenly passed in the next request.

lib/ddtrace/contrib/ethon/easy_patch.rb

delner · 2019-07-23T15:08:45Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+          def set_attributes(options)
+            return super unless tracer_enabled?
+
+            # Make sure headers= will get called


What's the reason for forcing headers= to be called?

I wanted to force the @datadog_original_headers to be set as soon as possible, so that later on I can assume it is set. Now it seems to me that it's not essential & we can reduce the number of patched methods. I can store them in headers= and then check if variable is set or not when injecting the headers.

delner · 2019-07-23T15:17:05Z

lib/ddtrace/contrib/ethon/multi_patch.rb

+            handles = super(easy)
+            return handles if handles.nil? || !tracer_enabled?
+
+            easy.datadog_before_request


Not very familiar with the construction of the framework, but is Multi composed of several Easy instances? And when it executes, will it call Easy#perform? If so, would that call datadog_before_request making this redundant?

There is a more detailed description that I put in a different comment; Multi uses a different approach to execute Easy objects that it holds, it won't call Easy#perform.

Per my response above, is there an equivalent #perform method for Multi? Might want to consider adding a parent span around the lifetime of this multi operation which bundles up the Easy requests. Just a thought.

I added the parent span support. The logic is the following: it is created when the first easy is added to multi and it is finalized when multi#perform has finished. Multi can be reused - every time easy is added it creates a parent span if it doesn't exist yet.

spec/ddtrace/contrib/ethon/ethon_patch_spec.rb

al-kudryavtsev

Thanks for the review! I left some comments & will work on the tweaks

lib/ddtrace/contrib/ethon/easy_patch.rb

al-kudryavtsev · 2019-07-24T15:15:08Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+        module InstanceMethods
+          include InstanceMethodsCompatibility unless Gem::Version.new(RUBY_VERSION) >= Gem::Version.new('2.0.0')
+
+          def http_request(url, action_name, options = {})


For sure! I'll start with sync requests.

To make a sync request, the client uses Easy object. First it is created, and as a part of initialization it calls the set_attributes method which will in turn call headers=, but only if headers are provided. The client can also set the headers directly using headers= method. The problem is that as soon as headers are set they are converted to FFI pointer, that's why I store their original version. And we don't have a span yet to inject the tracing headers.

http_request method is used as a helper to populate the easy with necessary information for HTTP request. In particular, HTTP method is passed (for example, easy.http_request("www.example.com", :post, { params: { a: 1 }, body: { b: 2 } })). The problem is that this data is not preserved on easy in a way that can be used to recover the HTTP method easily. The factories set various low-level attributes (https://github.com/typhoeus/ethon/tree/master/lib/ethon/easy/http), and libcurl itself figures out HTTP method based on all these attributes. So I figured it is easier to just store the method instead of trying to recover it from libcurl attributes.

The sync way to execute an easy is calling perform. I patch the perform to create the span and to inject the headers if needed (that's why I stored the original headers on the object, to avoid reading them back from FFI). perform uses libcurl's easy_perform method and then calls the complete method on easy that I patch to finish the span.

Async requests are executed using Multi. Individual requests are represented by the same Easy objects, however the execution flow is different. Easy instance is added to the Multi using add method. I consider this to be the beginning of the request execution, that's why the span is created there. There is no easy way to know if it is the beginning of the request or if multi is still on hold because multi execution can be in progress in the other thread. I follow here the way Typhoeus uses Multi : it adds the easy objects to Multi right before executing the perform. It also has hooks for before-request event which are executed in the add call.

The execution of multi happens when the perform method is called. This method uses libcurl's multi_perform and calls complete on easy objects which finished executing.

al-kudryavtsev · 2019-07-24T15:23:15Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+          def set_attributes(options)
+            return super unless tracer_enabled?
+
+            # Make sure headers= will get called


I wanted to force the @datadog_original_headers to be set as soon as possible, so that later on I can assume it is set. Now it seems to me that it's not essential & we can reduce the number of patched methods. I can store them in headers= and then check if variable is set or not when injecting the headers.

lib/ddtrace/contrib/ethon/easy_patch.rb

al-kudryavtsev · 2019-07-24T15:39:50Z

lib/ddtrace/contrib/ethon/multi_patch.rb

+            handles = super(easy)
+            return handles if handles.nil? || !tracer_enabled?
+
+            easy.datadog_before_request


There is a more detailed description that I put in a different comment; Multi uses a different approach to execute Easy objects that it holds, it won't call Easy#perform.

spec/ddtrace/contrib/ethon/ethon_patch_spec.rb

al-kudryavtsev · 2019-07-24T22:06:48Z

@delner I updated the PR - dropped ruby 1.9 support, removed set_attributes patch & improved tests

delner · 2019-07-25T18:33:38Z

@al-kudryavtsev Cool, I'll review this again soon. Can you also rebase this against 0.26-dev? That would be our merge target.

al-kudryavtsev · 2019-07-25T21:12:48Z

@delner 👍 Rebased

delner · 2019-07-30T18:39:53Z

@al-kudryavtsev This is looking great; very happy with the thoroughness here, and the tests! Just left one more suggestion regarding Multi, but it's not an outright requirement, just a thought. Otherwise, I think we're in good shape to merge this.

al-kudryavtsev · 2019-08-01T19:20:03Z

@delner Thanks! I made some changes to improve the correctness and added parent span support.

delner · 2019-08-05T18:45:46Z

lib/ddtrace/contrib/ethon/multi_patch.rb

@@ -19,19 +19,48 @@ def add(easy)
            handles = super(easy)
            return handles if handles.nil? || !tracer_enabled?

-            easy.datadog_before_request
+            easy.datadog_before_request(parent_span: datadog_multi_span)


Is it possible there's a delay between #add and #perform? When the span is created, it's auto created with the start time. I'm thinking about the possibility where someone creates a Multi request with Easy children, but then delays a while before actually executing it; could that cause the timing to be recorded incorrectly?

It is possible, more precise solution would probably require storing some state on Multi object. The complexity comes from the fact that extra Easy object can be added to Multi during the execution of perform (in easy complete callback, this is how Typhoeus implementation works). Current solution will work for Typhoeus since it always adding easy objects to multi right before executing it.
The same issue as you described can happen with Easy span because it is started when easy object is added to multi (which is not necessarily executing yet).
We can probably solve the issue by storing some flag on multi saying whether it is performing now or not.
Please let me know what you think. I'll be away from computer for one more week and will be able to work on this feature afterwards.

Might be worth it, as long as there isn't too much coupling between Easy and Multi instrumentation.

Is there any way to create the span at perform time for Easy? Even if that means storing other state on Easy initialization beforehand? Given these objects sound idempotent anyway, some state would be permissible.

It turned out extra state is not needed. Easy handles are stored on Multi instances so I can go through them and start child spans. When adding easy to multi, I can check if there is a span on multi which is an indication of multi's perform execution in progress.

delner · 2019-08-19T19:54:03Z

@al-kudryavtsev I noticed the builds broke on Mongo: you'll probably want to rebase this PR onto the top of 0.27-dev, which should fix the issue.

al-kudryavtsev · 2019-08-19T22:01:59Z

@delner I added the change to make multi tracing more accurate, rebased it on top of 0.27-dev. I also wrote a test to manually test instrumentation.
Here is a screenshot of the Hydra instrumentation which starts new requests during underlying multi's perform call:

delner · 2019-08-20T18:39:12Z

Wow, that screenshot looks awesome! I'll take another pass over the code, see if we can get this merged.

al-kudryavtsev · 2019-08-26T14:56:20Z

@delner Did you have a chance to look at the code?

delner · 2019-08-28T18:01:49Z

@al-kudryavtsev Apologies, was on PTO. I'm going to try to wrap this one up for our next release.

delner

Looking great, couple of minor changes I think, but nothing serious, should be quick. Once those are addressed we'll merge it.

delner · 2019-08-29T20:14:34Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+
+            # Store headers to call this method again when span is ready
+            @datadog_original_headers = headers
+            super headers


Minor but can you just call super here?

delner · 2019-08-29T20:18:39Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+          def datadog_tag_request
+            span = @datadog_span
+            uri = URI.parse(url)
+            method = defined?(@datadog_method) ? @datadog_method.to_s : ''


Should this be instance_variable_defined??

delner · 2019-08-29T20:43:36Z

lib/ddtrace/contrib/ethon/easy_patch.rb

+            span = @datadog_span
+            uri = URI.parse(url)
+            method = defined?(@datadog_method) ? @datadog_method.to_s : ''
+            span.resource = "#{method} #{uri.path}".lstrip


We only want to set the HTTP method here, e.g. GET, but omit the path.

Sounds crazy, and I see the value of including the path. However, the resource is meant to be a GROUP BY like key, which in practice requires values that meaningfully group traces on some user-defined dimension. HTTP paths often contain unique input (such as numeric IDs, API tokens), which tend to flatten these traces into groups too small to be meaningful. Hence, as a default behavior, we've been only setting the method as the resource for HTTP client integrations.

If the omission of path as a default behavior is an issue for users, the concession we could offer is the option to customize their span and set their own resource using a callback. See HTTP as an example: https://github.com/DataDog/dd-trace-rb/blob/master/lib/ddtrace/contrib/http/instrumentation.rb#L80. Implementing something like this is optional though, not necessary to merge this PR.

Also, it looks like the method can be nil; if you remove the path, it looks like the resource could then be an empty string, which will cause the trace to be malformed and dropped. We'll want to make sure resource is never blank, even if we have to give it some kind of default value instead.

Interesting, I was wondering why the resource contains only method name in other instrumentations. I changed the default value of method to N/A & handled the case when it is nil.

delner · 2019-08-29T20:47:34Z

spec/ddtrace/contrib/ethon/easy_patch_spec.rb

+  let(:configuration_options) { { tracer: tracer } }
+
+  before do
+    Datadog::Contrib::Ethon::Patcher.patch


I don't think this should be necessary: c.use will call patch automatically for the integration, and I think we want to verify that users don't need to manually patch.

delner · 2019-08-29T20:49:17Z

spec/ddtrace/contrib/ethon/multi_patch_spec.rb

+  let(:configuration_options) { { tracer: tracer } }
+
+  before do
+    Datadog::Contrib::Ethon::Patcher.patch


Same as above.

al-kudryavtsev · 2019-08-30T19:09:51Z

@delner Thanks for the review! I addressed the comments.

delner

Changes look good! We'll merge this for our next version. Thanks a ton for this contribution @al-kudryavtsev! 🎉

al-kudryavtsev force-pushed the ethon-tracing-pr branch 2 times, most recently from 22394f8 to 2f94ed6 Compare July 9, 2019 14:07

che-burashco mentioned this pull request Jul 12, 2019

Distributed tracing: Typhoeus support #527

Closed

delner reviewed Jul 23, 2019

View reviewed changes

al-kudryavtsev commented Jul 24, 2019

View reviewed changes

al-kudryavtsev changed the base branch from master to 0.26-dev July 25, 2019 21:00

al-kudryavtsev force-pushed the ethon-tracing-pr branch from def7d78 to ae9d7f6 Compare July 25, 2019 21:00

delner reviewed Aug 5, 2019

View reviewed changes

delner assigned al-kudryavtsev Aug 5, 2019

delner added community Was opened by a community member integrations Involves tracing integrations feature Involves a product feature labels Aug 5, 2019

al-kudryavtsev changed the base branch from 0.26-dev to 0.27-dev August 19, 2019 21:39

al-kudryavtsev added 6 commits August 19, 2019 17:46

Ethon & Typhoeus tracing support

4445f7c

Remove set_attributes override

280d143

Split tests, add more unit tests

d56384d

Skip integration tests when required

9b45868

Fix style errors

30604d9

Cleanup instrumentation state on Easy reset

172acbe

al-kudryavtsev added 3 commits August 19, 2019 17:46

Add parent span based on Multi.perform

f28d34f

Fix style

77a185b

Track multi performing state to achieve more precise span timing

d5ff7f6

al-kudryavtsev force-pushed the ethon-tracing-pr branch from 1d6c7d2 to d5ff7f6 Compare August 19, 2019 21:47

delner reviewed Aug 29, 2019

View reviewed changes

al-kudryavtsev added 2 commits August 30, 2019 14:56

Address PR review comments

7e9269f

Minor fix

f617f6f

delner approved these changes Aug 30, 2019

View reviewed changes

delner added this to the 0.27.0 milestone Sep 3, 2019

delner merged commit 67ea666 into DataDog:0.27-dev Sep 3, 2019

Ethon & Typhoeus tracing support #778

Ethon & Typhoeus tracing support #778

Conversation

al-kudryavtsev commented Jul 3, 2019

Goal

Implementation details

al-kudryavtsev commented Jul 9, 2019

delner commented Jul 12, 2019

al-kudryavtsev commented Jul 12, 2019

delner commented Jul 15, 2019

che-burashco commented Jul 23, 2019

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

al-kudryavtsev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

al-kudryavtsev commented Jul 24, 2019

delner commented Jul 25, 2019

al-kudryavtsev commented Jul 25, 2019

delner commented Jul 30, 2019

al-kudryavtsev commented Aug 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delner commented Aug 19, 2019

al-kudryavtsev commented Aug 19, 2019

delner commented Aug 20, 2019

al-kudryavtsev commented Aug 26, 2019

delner commented Aug 28, 2019

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

al-kudryavtsev commented Aug 30, 2019

delner left a comment

Choose a reason for hiding this comment