New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Distributed tracing using OpenTelemtry #4029
Conversation
24affe1
to
e93a88d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this. It'll be great to have tracing support soon. 🎉
Two things that caught my eye on a quick glance, please see comments inline 👇 🙃
f80fdcc
to
cab362b
Compare
Love this! 👍 One thing I'm not too sure about though is all the new arguments added to
We've already seen this being problematic elsewhere, so perhaps this would be a good candidate for config rather than CLI arguments? Note that it's still possible to set config attributes through the CLI with |
@anderseknert I'm on board with making this part of the config instead 👍 Thanks for compiling the pros and cons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments about what "keeping the embeddings unaffected" means for this PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup without the package-level global looks good to me.
I believe there are a few points we should still get figured out:
- Can we have some sort of e2e test? There're a bunch of them in
test/e2e
, and I would imagine that some sort of mock (or in-memory) trace collector could be used to assert that the right things get sent for requests. - The config via json/yaml instead of CLI args would be nice. I'll try to find an example of how we do it in other places...
I've seen the additions to the v1 data API, would you consider that a first step, and that the other APIs would eventually follow, or is this something you'd perhaps want to include in this PR? I'd thing that the v0 data API could use this, too, and perhaps also the policy API. What do you think? If it's getting unwieldy, I suppose we could tackle that in follow-up PRs, too.
cmd/run.go
Outdated
runCommand.Flags().BoolVarP(&cmdParams.rt.DistributedTracingEncryptionSkipVerify, "distributed-tracing-tls-skip-verify", "", false, "disable certificate chain verification") | ||
runCommand.Flags().StringVarP(&cmdParams.distributedTracingTLSCertFile, "distributed-tracing-tls-cert-file", "", "", "set path of TLS certificate file of the tracing exporter") | ||
runCommand.Flags().StringVarP(&cmdParams.distributedTracingTLSPrivateKeyFile, "distributed-tracing-tls-private-key-file", "", "", "set path of TLS private key file of the tracing exporter") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @anderseknert mentioned, it would be nice to put these into the yaml/json config instead. I'm trying to find a few pointers on what to do where 👀
@rvalkenaers thanks for bearing with me! This is a tremendous feature, but also quite a bit of work, I'm afraid 😅 |
I did a quick test and we're actually creating spans for all API endpoints. |
Also the v0 data API with http.send and all that? I thought we hadn't wired that config into the call to rego.New. |
For the v0 data API we only have the server spans, not the http.send spans. I would prefer to implement those in follow-up PRs. |
@rvalkenaers I'm not actually completely sure how the config change would look like -- @anderseknert, would making the config a piece of JSON mean that this would be a plugin? Like status, logs, bundles... |
I hadn't thought about that. I guess it makes sense from an architectural POV, but you're probably better equipped to judge that :) On the other hand, if this attaches to both the server component and things like http.send, maybe treating it like an integrated component makes things simpler.. |
Yup I think that's the case. I think we'll need something along these lines, then, right? 👉 https://github.com/open-policy-agent/opa/blob/v0.35.0/topdown/cache/cache.go#L22-L60 Wired up in the same way, it would be configurable either via CLI Let's also ensure that this gets documented so that people are aware of this cool new feature. But we should probably first settle on how to configure it. |
d2d3d54
to
0fd87de
Compare
Sorry for the wait, I haven't forgotten about this 😅 I'll play with this some locally, but it looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've stumbled upon something we'll need to fix here: logging.
When I run this locally, with DT enabled, and the endpoint isn't available, I see a bunch of these:
{"client_addr":"127.0.0.1:50944","level":"info","msg":"Received request.","req_id":7,"req_method":"GET","req_path":"/health","time":"2021-12-09T13:34:23+01:00"}
{"client_addr":"127.0.0.1:50944","level":"info","msg":"Sent response.","req_id":7,"req_method":"GET","req_path":"/health","resp_bytes":2,"resp_duration":0.291205,"resp_status":200,"time":"2021-12-09T13:34:23+01:00"}
2021/12/09 13:34:27 context deadline exceeded
2021/12/09 13:34:27 traces exporter is disconnected from the server 127.0.0.1:4317: context deadline exceeded
2021/12/09 13:34:42 context deadline exceeded
{"client_addr":"127.0.0.1:50981","level":"info","msg":"Received request.","req_id":8,"req_method":"GET","req_path":"/health","t
It's a mix of OPA's logs, using its logging infrastructure, and some other logs, coming from the DT client. We somehow need to get the latter logs to use OPA's logging, too.
I haven't dug in deep yet to find a fix, but maybe you know what to do right away? 😃
👉 https://pkg.go.dev/go.opentelemetry.io/otel#SetErrorHandler could using this be the way to go?
|
I had this in place initially I removed it when I got rid of the global vars but I didn't found another way to do it. |
Ahh, alright. Are there non-global ways to provide an error handler? |
Can we put the call to set up the global handler in a place where it won't affect people embedding OPA? I'm thinking about |
Sounds good to me. |
c94f11b
to
d409ccf
Compare
Thank you, logging looks alright now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small-impact nitpicks, I think we're good to go then. 🚀 Thanks for bearing with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvalkenaers first of all, thank you SO much for putting in the work to enable tracing in OPA, it's much appreciated 🙏
I agree with @srenatus that this is very close to being mergeable--there's just one thing I'd like us to discuss before merging and that has to do w/ package dependencies:
-
The rego and topdown packages have been extended to include the internal/distributedtracing#Options type in exported functions. The problem with that is that callers outside the top-level OPA package will not be able to provide values of this type, because the type is behind an internal package in OPA. I think we'll need to create a new top-level package (e.g., github.com/open-policy-agent/opa/tracing) where we export the necessary types... I think Options is the only type but I could be wrong.
-
These changes put a direct dependency on OpenTelemetry into OPA, and not just OPA but also the low-level packages (rego and topdown) that users rely on solely for policy evaluation. Today those packages are constructed intentionally to avoid large 3rdparty dependencies. I'd like to know how feasible it would be to define a small shim interface in OPA--this would go into the exported package I mentioned in (1). OPA would need to implement some boilerplate but it would remove the hard dependency on opentelemetry from the rego and topdown packages.
@srenatus PTAL and let me know if this makes sense. EDIT: If you'd rather sort this out post-merge, that also works.
@tsandall I agree, the dependency thing is a good point. As far as I am concerned, I'd leave this up to @rvalkenaers: I could take on dealing with those two points between merge and 0.36.0 (i.e. next week); or you (@rvalkenaers) could deal with it before we merge. Either way is fine with me. 🔀 😄 |
@srenatus If you could pick it up it would be much appreciated 🙏 |
@rvalkenaers OK sure! Would you mind squashing these commits, adding a short commit message, so I can push the merge button? |
This commit implements tracing using the net/http automatic instrumentation wrappers on the server and topdown/http packages. Fixes open-policy-agent#1469 Signed-off-by: Rien Valkenaers <rien.valkenaers@gmail.com> Co-Authored-By: Stephan Renatus <stephan@styra.com>
b7b9e52
to
77891cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution that get's the ball rolling with OPA and OpenTelemetry 👏 🎉 🥳
I'll merge this, and take on the remaining little cleanup we've discussed in another PR.
This is great! Thanks @rvalkenaers 👍 |
This commit implements tracing using the net/http automatic instrumentation wrappers on the server and topdown/http packages.
Following configuration flags are added:
--distributed-tracing enable distributed tracing using OpenTelemetry Tracing
--distributed-tracing-address string address of the OpenTelemetry Collector gRPC endpoint (default "localhost:4317")
--distributed-tracing-sample-rate int precentage of traces that are sampled and exported (default 100)
--distributed-tracing-service-name string logical name of the service used by OpenTelemetry Tracing (default "opa")
--distributed-tracing-tls {off,tls,mtls} set distributed tracing tls scheme (default off)
--distributed-tracing-tls-cert-file string set path of TLS certificate file of the tracing exporter
--distributed-tracing-tls-private-key-file string set path of TLS private key file of the tracing exporter
--distributed-tracing-tls-skip-verify disable certificate chain verification
Fixes #1469
Signed-off-by: Rien Valkenaers rien.valkenaers@gmail.com
This is an example trace for a "Pull Data during Evaluation" scenario: