New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GraphQL query context based tracing #79
feat: Add GraphQL query context based tracing #79
Conversation
e2ada4d
to
0d61e40
Compare
Thanks so much for the contribution @toneymathews! I need to familiarize myself with the changes in graphql-ruby and how On the topic of version compatibility: generally speaking, each instrumentation gem's If we imagine a world where there is a A couple alternatives to that:
The other @open-telemetry/ruby-contrib-maintainers may have some thoughts on a more holistic strategy for managing compatibility, here. |
d4fbc74
to
ae0e78c
Compare
For ease of explanation, I'll be referring to graphql-ruby versions in the below form. We have added backward compatibility for old versions in this commit. However, one test case we’ve added will have different outcomes based on the graphql-ruby version. This test is skipped in old versions. When “enable_platform_field”, “enable_platform_authorized” and “enable_platform_resolve_type” are enabled(set to true) in configuration, but are disabled(set to false) within query execution context,
It is detailed in the below table for the tests added
At this point, this is supported across different versions in the below manner
I’ve tested this locally with graphql-ruby 1.13.13 as well. |
Thanks for adding to Appraisals @toneymathews! Will take a look. |
I've assigned this to myself, I'll work towards giving this a proper thorough review this week. It has a few implications, and unspecified behaviour so we need to be really detailed here. |
7af15b8
to
0d51457
Compare
Hey, so apologies for the delay on the review here. To be honest I've been going over the high level intention of this PR more than the code itself. So the simplified high level break down I see here consists of three configurable states per field, I will use platform field for the following example states.
So we have on/off/per request. Configuring on and off states is easy as we already have that functionality implemented. So now to the meat of this PR, per request span creation. The approach taken makes use of GraphQL specific implementation details to enable the tracing of said field, which I think is a very pragmatic approach. For "fun" let's add a some distributed tracing flavoured scope creep to this new feature. Consider the following hypothetical workflow. //start winded example I'm a developer debugging this workflow and am trying to understand why for some users the order creation process times out. So I enabled all the GraphQL spans for ServiceA via the functionality this PR surfaces. I get all the spans, I read my trace, and discover the issue isn't in serviceA. I suspect something bad is happening in ServiceC. How do I proceed? When testing the workflow I'm making requests against ServiceA, making a direct request to ServiceC is sort of tricky and doesn't capture a real world scenario. How do I tell ServiceC to enable all the GraphQL spans? I'm intentionally trying to lead us to consider using baggage here instead of the GraphQL specific context. It's a mechanism for propagating context across services boundaries using a W3C specced header created exactly for this use case. It would allow us as users to add a baggage header that says something like "baggage: ServiceC=enable_platform_field;" which could signal which application we want to enable high the verbosity request tracing. Nothing in this PR or my wall of text above is formally specced by OpenTelemetry so we have to be very mindful of how we implement this. We have some latitude to experiment while the instrumentation is not 1.0'd but we have to very mindful of the lasting consequences of our choices here. This needs further discussion from @open-telemetry/ruby-approvers @open-telemetry/ruby-maintainers |
Thank you for taking the time to review the suggestion and give it some thought. GraphQL is not necessarily invoked through an HTTP request, although is the most common scenario. GraphQL being HTTP agnostic means that something needs to tell Practically speaking, I am unclear on an alternative to how the current request object or configuration is proposed to be passed down into HTTP
A frontend client may not know the "service name" to put into a Regardless, these changes can work with distributed tracing. Configuration can still be propagated across service boundaries using whatever tooling exists today I imagine. GraphQL wouldn't be concerned with this propagation as it is not the mechanism directly calling other services, instead it would be other clients (which are likely instrumented). |
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
opentelemetry-instrumentation-graphql
provides aGraphQLTracer
(source) based onGraphQL::Tracing::PlatformTracing
(source). This includes several schema level configuration options that can enable/disable specific parts of tracing.The current options are global, but we could benefit from also having per request configuration. For example, a HTTP header may request verbose traces be recorded, ideally enabling
enable_platform_field
for only this request’s GraphQL execution. This way a client can be permitted to opt into tracing, without having to enable it for all requests.To make this possible, the GraphQL execution context seemed like an ideal place to have this request specific configuration. Access to context is now possible as of graphql-ruby 1.13.13 and 2.0.9 (see changelog, added in #4077), meaning we have this option going forward, but potentially need to remain backwards compatible.
In
cached_platform_key
, the newer versions of graphql-ruby have bothcontext
andtrace_phase
, so we can determine which setting to check and which ofplatform_field_key
,platform_authorized_key
,platform_resolve_type_key
would be called on a cache miss.If we were to release a new version of
opentelemetry-instrumentation-graphql
, how should we handle checking the version of graphql-ruby, either being backwards compatible or having a minimum version?The CI failures have come up since
cached_patform_key
now expects 3 arguments (trace_phase
being the new one) in newer versions of graphql-ruby.cc @ravangen