Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethon & Typhoeus tracing support #778

Merged
merged 11 commits into from Sep 3, 2019

Conversation

al-kudryavtsev
Copy link
Contributor

Goal

Support distributed tracing when making requests through Ethon library (including Typhoeus requests which use Ethon)
The related issue is #527

Implementation details

Links to the libraries:
https://github.com/typhoeus/ethon
https://github.com/typhoeus/typhoeus

The implementation was initially based on rest_client integration, but I had to change the approach because Ethon is a wrapper around libcurl which can execute many requests concurrently. I had to patch quite a few methods on Easy to support more use cases. There are different examples of the requests in the spec (made through Ethon's Easy & Multi and through Typhoeus's Request & Hydra).

This code has been in production for a few weeks, didn't see any issues. However we don't use concurrent requests (done through Multi or Hydra) so relying on unit tests for other use cases. Ethon also supports non-http requests, this PR does not address these use cases.

Here is a screenshot of distributed tracing achieved through new integration
Screen Shot 2019-06-27 at 4 20 47 PM

@al-kudryavtsev al-kudryavtsev force-pushed the ethon-tracing-pr branch 2 times, most recently from 22394f8 to 2f94ed6 Compare July 9, 2019 14:07
@al-kudryavtsev
Copy link
Contributor Author

Fixed ruby 1.9 support & rebased to get rid of failed mongo tests

@delner
Copy link
Contributor

delner commented Jul 12, 2019

@al-kudryavtsev The pull request is looking quite filled out! Is this ready for review?

Also, since I'm not familiar with Typhoeus/Ethon, could you explain the relationship between the two? What are the responsibilities of each library? Would a user ever want to activate Typhoeus tracing without Ethon or vice versa?

I saw Typhoeus added as a dependency, and Ethon getting some instrumentation, but I didn't see any instrumentation explicitly defined for Typhoeus, and it wasn't clear what this means for Typhoeus support.

@al-kudryavtsev
Copy link
Contributor Author

@delner Yes, the PR is ready for review, thanks!

Ethon is a libcurl wrapper which provides two main objects, Easy and Multi, which can be used to make requests (not only HTTP). Typhoeus is build on top of Ethon and is designed for HTTP requests. It adds extra features such as memoization / caching, and in general provides more high-level interface.

Initially I wanted to support Typhoeus because that's what we use for HTTP requests. The problem is that with extra features it is actually harder to implement proper & reliable instrumentation, especially with multiple parallel asynchronous requests. At the same time, Ethon's Easy object is a good candidate for tracking asynchronous requests because it is used when request starts and when response is ready.

What I did instead is instrumenting Ethon. This way HTTP requests coming through Typhoeus are fully supported and other use cases of Ethon are also supported (or support can be added relatively easy). I added Typhoeus as a dependency in order to unit test the instrumentation with Typhoeus primitives.

So, to answer your question, if the user wants to instrument HTTP requests made with Typhoeus, they will need to enable Ethon tracing. If they need to trace some other features of Typhoeus (like caching of responses), they will need a different instrumentation.

Also I'm not an author of any of these libraries, so it is possible I'm missing something and there is an easier way. There are hooks in both libraries, however it doesn't seem to be a reliable approach to tracing.

Please let me know what you think.

@delner
Copy link
Contributor

delner commented Jul 15, 2019

@al-kudryavtsev This is a super helpful description, thank you!

Sounds like it'll be good then to have this Ethon instrumentation either way, only major question to me outstanding is: "does this constitute 'Typhoeus' support?" I don't think I have this answer; I asked the folks on #527 what they think.

If it does, then we somehow have to make it obvious/intuitive to users who want Typhoeus traces to activate the Ethon instrumentation. This might require us to add something like c.use :typhoeus for that purpose, but we'll cross that bridge later.

If it doesn't, I will happily accept this PR as Ethon instrumentation on its own merits, but we won't be able to close #527, and as a community we'll need to find a better way to give the proper feature coverage to Typhoeus at a later date.

In any case, I'll give this a review a bit later!

@che-burashco
Copy link

@delner Did you have a chance to look at the PR by any chance?

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking very good; the tests are thorough, and pretty much all the boxes on features appear to be checked. :)

My feedback primarily is about developing my understanding of how the instrumentation works in the Ethon framework, and if there's any way to simplify it (e.g. avoiding use of state/instance variables, etc), and a little bit about reorganizing the tests into smaller groupings.

I think once I can get my understanding to "click" at bit more and we make a few tweaks, we should be able to get this approved.

lib/ddtrace/contrib/ethon/easy_patch.rb Outdated Show resolved Hide resolved
module InstanceMethods
include InstanceMethodsCompatibility unless Gem::Version.new(RUBY_VERSION) >= Gem::Version.new('2.0.0')

def http_request(url, action_name, options = {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you walk me through the flow of a request here a bit? Order the of the functions being called, sync vs async requests, etc.

I see a number of methods being patched here, each collecting and storing some information as instance variables. I'd like to understand how the state is built. Is this information not available on perform without first collecting in between all these methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I'll start with sync requests.

To make a sync request, the client uses Easy object. First it is created, and as a part of initialization it calls the set_attributes method which will in turn call headers=, but only if headers are provided. The client can also set the headers directly using headers= method. The problem is that as soon as headers are set they are converted to FFI pointer, that's why I store their original version. And we don't have a span yet to inject the tracing headers.

http_request method is used as a helper to populate the easy with necessary information for HTTP request. In particular, HTTP method is passed (for example, easy.http_request("www.example.com", :post, { params: { a: 1 }, body: { b: 2 } })). The problem is that this data is not preserved on easy in a way that can be used to recover the HTTP method easily. The factories set various low-level attributes (https://github.com/typhoeus/ethon/tree/master/lib/ethon/easy/http), and libcurl itself figures out HTTP method based on all these attributes. So I figured it is easier to just store the method instead of trying to recover it from libcurl attributes.

The sync way to execute an easy is calling perform. I patch the perform to create the span and to inject the headers if needed (that's why I stored the original headers on the object, to avoid reading them back from FFI). perform uses libcurl's easy_perform method and then calls the complete method on easy that I patch to finish the span.

Async requests are executed using Multi. Individual requests are represented by the same Easy objects, however the execution flow is different. Easy instance is added to the Multi using add method. I consider this to be the beginning of the request execution, that's why the span is created there. There is no easy way to know if it is the beginning of the request or if multi is still on hold because multi execution can be in progress in the other thread. I follow here the way Typhoeus uses Multi : it adds the easy objects to Multi right before executing the perform. It also has hooks for before-request event which are executed in the add call.

The execution of multi happens when the perform method is called. This method uses libcurl's multi_perform and calls complete on easy objects which finished executing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is pretty interesting, and exactly the kind of explanation I was looking for. (Thank you!)

I can see why you'd want to store this info before it gets turned into a format that's hard to read from; as long these objects are 1-1 with requests, then this seems okay. Just have to be careful in any scenario in which an Easy object is re-used or re-ran? Not sure if that happens, but some food for thought.

If Multi uses multiple Easy objects, would it make sense to add an additional parent span to the multi operation? Thinking about the case if users want to see the batch as an operation, in addition to its constituent parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you'd want to store this info before it gets turned into a format that's hard to read from; as long these objects are 1-1 with requests, then this seems okay. Just have to be careful in any scenario in which an Easy object is re-used or re-ran? Not sure if that happens, but some food for thought.

That's a good point. I added extra safety net by patching the reset method which is used before re-using easy instance. This way I can be sure that HTTP method or headers are not mistakenly passed in the next request.

lib/ddtrace/contrib/ethon/easy_patch.rb Show resolved Hide resolved
lib/ddtrace/contrib/ethon/easy_patch.rb Show resolved Hide resolved
def set_attributes(options)
return super unless tracer_enabled?

# Make sure headers= will get called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for forcing headers= to be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to force the @datadog_original_headers to be set as soon as possible, so that later on I can assume it is set. Now it seems to me that it's not essential & we can reduce the number of patched methods. I can store them in headers= and then check if variable is set or not when injecting the headers.

handles = super(easy)
return handles if handles.nil? || !tracer_enabled?

easy.datadog_before_request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very familiar with the construction of the framework, but is Multi composed of several Easy instances? And when it executes, will it call Easy#perform? If so, would that call datadog_before_request making this redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a more detailed description that I put in a different comment; Multi uses a different approach to execute Easy objects that it holds, it won't call Easy#perform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my response above, is there an equivalent #perform method for Multi? Might want to consider adding a parent span around the lifetime of this multi operation which bundles up the Easy requests. Just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the parent span support. The logic is the following: it is created when the first easy is added to multi and it is finalized when multi#perform has finished. Multi can be reused - every time easy is added it creates a parent span if it doesn't exist yet.

spec/ddtrace/contrib/ethon/ethon_patch_spec.rb Outdated Show resolved Hide resolved
spec/ddtrace/contrib/ethon/ethon_patch_spec.rb Outdated Show resolved Hide resolved
Copy link
Contributor Author

@al-kudryavtsev al-kudryavtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I left some comments & will work on the tweaks

lib/ddtrace/contrib/ethon/easy_patch.rb Outdated Show resolved Hide resolved
module InstanceMethods
include InstanceMethodsCompatibility unless Gem::Version.new(RUBY_VERSION) >= Gem::Version.new('2.0.0')

def http_request(url, action_name, options = {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I'll start with sync requests.

To make a sync request, the client uses Easy object. First it is created, and as a part of initialization it calls the set_attributes method which will in turn call headers=, but only if headers are provided. The client can also set the headers directly using headers= method. The problem is that as soon as headers are set they are converted to FFI pointer, that's why I store their original version. And we don't have a span yet to inject the tracing headers.

http_request method is used as a helper to populate the easy with necessary information for HTTP request. In particular, HTTP method is passed (for example, easy.http_request("www.example.com", :post, { params: { a: 1 }, body: { b: 2 } })). The problem is that this data is not preserved on easy in a way that can be used to recover the HTTP method easily. The factories set various low-level attributes (https://github.com/typhoeus/ethon/tree/master/lib/ethon/easy/http), and libcurl itself figures out HTTP method based on all these attributes. So I figured it is easier to just store the method instead of trying to recover it from libcurl attributes.

The sync way to execute an easy is calling perform. I patch the perform to create the span and to inject the headers if needed (that's why I stored the original headers on the object, to avoid reading them back from FFI). perform uses libcurl's easy_perform method and then calls the complete method on easy that I patch to finish the span.

Async requests are executed using Multi. Individual requests are represented by the same Easy objects, however the execution flow is different. Easy instance is added to the Multi using add method. I consider this to be the beginning of the request execution, that's why the span is created there. There is no easy way to know if it is the beginning of the request or if multi is still on hold because multi execution can be in progress in the other thread. I follow here the way Typhoeus uses Multi : it adds the easy objects to Multi right before executing the perform. It also has hooks for before-request event which are executed in the add call.

The execution of multi happens when the perform method is called. This method uses libcurl's multi_perform and calls complete on easy objects which finished executing.

def set_attributes(options)
return super unless tracer_enabled?

# Make sure headers= will get called
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to force the @datadog_original_headers to be set as soon as possible, so that later on I can assume it is set. Now it seems to me that it's not essential & we can reduce the number of patched methods. I can store them in headers= and then check if variable is set or not when injecting the headers.

lib/ddtrace/contrib/ethon/easy_patch.rb Show resolved Hide resolved
lib/ddtrace/contrib/ethon/easy_patch.rb Show resolved Hide resolved
handles = super(easy)
return handles if handles.nil? || !tracer_enabled?

easy.datadog_before_request
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a more detailed description that I put in a different comment; Multi uses a different approach to execute Easy objects that it holds, it won't call Easy#perform.

spec/ddtrace/contrib/ethon/ethon_patch_spec.rb Outdated Show resolved Hide resolved
@al-kudryavtsev
Copy link
Contributor Author

@delner I updated the PR - dropped ruby 1.9 support, removed set_attributes patch & improved tests

@delner
Copy link
Contributor

delner commented Jul 25, 2019

@al-kudryavtsev Cool, I'll review this again soon. Can you also rebase this against 0.26-dev? That would be our merge target.

@al-kudryavtsev al-kudryavtsev changed the base branch from master to 0.26-dev July 25, 2019 21:00
@al-kudryavtsev
Copy link
Contributor Author

@delner 👍 Rebased

@delner
Copy link
Contributor

delner commented Jul 30, 2019

@al-kudryavtsev This is looking great; very happy with the thoroughness here, and the tests! Just left one more suggestion regarding Multi, but it's not an outright requirement, just a thought. Otherwise, I think we're in good shape to merge this.

@al-kudryavtsev
Copy link
Contributor Author

@delner Thanks! I made some changes to improve the correctness and added parent span support.

@@ -19,19 +19,48 @@ def add(easy)
handles = super(easy)
return handles if handles.nil? || !tracer_enabled?

easy.datadog_before_request
easy.datadog_before_request(parent_span: datadog_multi_span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible there's a delay between #add and #perform? When the span is created, it's auto created with the start time. I'm thinking about the possibility where someone creates a Multi request with Easy children, but then delays a while before actually executing it; could that cause the timing to be recorded incorrectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, more precise solution would probably require storing some state on Multi object. The complexity comes from the fact that extra Easy object can be added to Multi during the execution of perform (in easy complete callback, this is how Typhoeus implementation works). Current solution will work for Typhoeus since it always adding easy objects to multi right before executing it.
The same issue as you described can happen with Easy span because it is started when easy object is added to multi (which is not necessarily executing yet).
We can probably solve the issue by storing some flag on multi saying whether it is performing now or not.
Please let me know what you think. I'll be away from computer for one more week and will be able to work on this feature afterwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth it, as long as there isn't too much coupling between Easy and Multi instrumentation.

Is there any way to create the span at perform time for Easy? Even if that means storing other state on Easy initialization beforehand? Given these objects sound idempotent anyway, some state would be permissible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out extra state is not needed. Easy handles are stored on Multi instances so I can go through them and start child spans. When adding easy to multi, I can check if there is a span on multi which is an indication of multi's perform execution in progress.

@delner delner added community Was opened by a community member integrations Involves tracing integrations feature Involves a product feature labels Aug 5, 2019
@delner
Copy link
Contributor

delner commented Aug 19, 2019

@al-kudryavtsev I noticed the builds broke on Mongo: you'll probably want to rebase this PR onto the top of 0.27-dev, which should fix the issue.

@al-kudryavtsev al-kudryavtsev changed the base branch from 0.26-dev to 0.27-dev August 19, 2019 21:39
@al-kudryavtsev
Copy link
Contributor Author

@delner I added the change to make multi tracing more accurate, rebased it on top of 0.27-dev. I also wrote a test to manually test instrumentation.
Here is a screenshot of the Hydra instrumentation which starts new requests during underlying multi's perform call:
Screen Shot 2019-08-19 at 5 36 33 PM

@delner
Copy link
Contributor

delner commented Aug 20, 2019

Wow, that screenshot looks awesome! I'll take another pass over the code, see if we can get this merged.

@al-kudryavtsev
Copy link
Contributor Author

@delner Did you have a chance to look at the code?

@delner
Copy link
Contributor

delner commented Aug 28, 2019

@al-kudryavtsev Apologies, was on PTO. I'm going to try to wrap this one up for our next release.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, couple of minor changes I think, but nothing serious, should be quick. Once those are addressed we'll merge it.


# Store headers to call this method again when span is ready
@datadog_original_headers = headers
super headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but can you just call super here?

def datadog_tag_request
span = @datadog_span
uri = URI.parse(url)
method = defined?(@datadog_method) ? @datadog_method.to_s : ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be instance_variable_defined??

span = @datadog_span
uri = URI.parse(url)
method = defined?(@datadog_method) ? @datadog_method.to_s : ''
span.resource = "#{method} #{uri.path}".lstrip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to set the HTTP method here, e.g. GET, but omit the path.

Sounds crazy, and I see the value of including the path. However, the resource is meant to be a GROUP BY like key, which in practice requires values that meaningfully group traces on some user-defined dimension. HTTP paths often contain unique input (such as numeric IDs, API tokens), which tend to flatten these traces into groups too small to be meaningful. Hence, as a default behavior, we've been only setting the method as the resource for HTTP client integrations.

If the omission of path as a default behavior is an issue for users, the concession we could offer is the option to customize their span and set their own resource using a callback. See HTTP as an example: https://github.com/DataDog/dd-trace-rb/blob/master/lib/ddtrace/contrib/http/instrumentation.rb#L80. Implementing something like this is optional though, not necessary to merge this PR.

Also, it looks like the method can be nil; if you remove the path, it looks like the resource could then be an empty string, which will cause the trace to be malformed and dropped. We'll want to make sure resource is never blank, even if we have to give it some kind of default value instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I was wondering why the resource contains only method name in other instrumentations. I changed the default value of method to N/A & handled the case when it is nil.

let(:configuration_options) { { tracer: tracer } }

before do
Datadog::Contrib::Ethon::Patcher.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be necessary: c.use will call patch automatically for the integration, and I think we want to verify that users don't need to manually patch.

let(:configuration_options) { { tracer: tracer } }

before do
Datadog::Contrib::Ethon::Patcher.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@al-kudryavtsev
Copy link
Contributor Author

@delner Thanks for the review! I addressed the comments.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! We'll merge this for our next version. Thanks a ton for this contribution @al-kudryavtsev! 🎉

@delner delner added this to the 0.27.0 milestone Sep 3, 2019
@delner delner merged commit 67ea666 into DataDog:0.27-dev Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Was opened by a community member feature Involves a product feature integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants