Optional instrumentation for recording GraphQL response field lengths in OTel #5199

tninesling · 2024-05-17T21:15:52Z

Overview

Adds a new instrumentation config, graphql, which supports a single metric called field.length. When enabled, this will publish the lengths of array fields returned in primary supergraph responses. This is primarily meant to help debug unexpected cost values calculated by the demand control plugin, as these discrepancies are multiplied by the length of lists in the responses.

Primary responses only

Note that this implementation does not work for deferred responses. The primary blocker for this is that we don't currently have a way to zip a response with a query when that response doesn't start at the query root. To make this work, we would need to take the deferred response's json path and determine which subsection of the schema we should use for the zip procedure.

No support for custom attributes

The other instrumentation configurations support custom metrics using predefined attributes, for example, you can create a custom router metric based on the http response status code. This functionality comes from the custom histogram/attribute/selector framework we've implemented, but this GraphQL field-related code does not seem to fit cleanly into those existing abstractions. In the interest of time, I've settled on creating this one-off metric which is not extensible and cannot be used in custom metrics.

No support for conditions

One change not included in this PR that we will need to add is support for filtering via conditions. This metric will be published for every list field across all responses when enabled, which has the potential to produce far more information than is useful or wanted. The existing conditions implementation is likely not compatible with this implementation as-is because we need to check a given condition for each field in the response when determining if we should publish the metric or not. The current conditions setup will cache any evaluated condition, such that if the condition is true once, it will be rewritten to a static true condition that will not be re-evaluated. We will need to create some uncached equivalent which can be evaluated several times within a single request pipeline to be used with this field length metric. That will be coming in the next PR.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

…sponse as an iterator

…a boolean flag to toggle it on/off

github-actions · 2024-05-17T21:16:04Z

@tninesling, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

router-perf · 2024-05-17T21:16:23Z

tninesling · 2024-05-28T15:16:28Z

This was redone in #5215

tninesling and others added 16 commits May 14, 2024 17:09

GraphQL selector scaffolding

857cf10

Add condition to yaml fixture

3d57181

Move some files around

ee4eab8

Change req/res types to unit for graphql selectors

d1ddb03

Get schema to be parse-able

264c8fe

Graphql instruments WIP.

bb8b08d

A poor attempt at converting SchemaAwareResponse to an iterator

e8982aa

A better, but still not totally working, attempt at traversing the re…

3554a37

…sponse as an iterator

Still broken, but at least we have a visitor

2a04c62

Try to shoehorn things into fitting the request/response model...

3c6837b

Just add a reasonable histogram which can visit a typed response and …

05b2dff

…a boolean flag to toggle it on/off

Add some tests

0cf67cb

Remove old response field interface

a4c7f9c

Leave comment about response zipper limitations

192df98

Fix lint errors

dca24cc

Schema gen update

ffad756

tninesling requested a review from BrynCooke May 17, 2024 21:15

Merge branch 'dev' into tninesling/graphql-instruments

23fb11e

tninesling closed this May 28, 2024

tninesling deleted the tninesling/graphql-instruments branch May 28, 2024 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional instrumentation for recording GraphQL response field lengths in OTel #5199

Optional instrumentation for recording GraphQL response field lengths in OTel #5199

tninesling commented May 17, 2024

github-actions bot commented May 17, 2024

router-perf bot commented May 17, 2024

tninesling commented May 28, 2024

Optional instrumentation for recording GraphQL response field lengths in OTel #5199

Optional instrumentation for recording GraphQL response field lengths in OTel #5199

Conversation

tninesling commented May 17, 2024

Overview

Primary responses only

No support for custom attributes

No support for conditions

Footnotes

github-actions bot commented May 17, 2024

router-perf bot commented May 17, 2024

tninesling commented May 28, 2024