New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header signing error affecting opentelemetry instrumentation #4472
Comments
Interesting; the list of headers to exclude from signing in JSv2: Lines 191 to 199 in 7d80de6
is significantly longer than in botocore: https://github.com/boto/botocore/blob/a9397847a3150eb619f3457e14557afd12ff13ab/botocore/auth.py#L61-L65 open telemetry headers aren't in there either. Maybe this issue belongs in the cross-SDK repo? |
Thanks @benkehoe It is surprising that they differ significantly. I think for now this issue only crops up with the js client because the javascript https://github.com/open-telemetry/opentelemetry-js/tree/main/experimental/packages/opentelemetry-instrumentation-http adds the traceparent header to every request. Adding an AWS specific code path to the HTTP instrumentation felt weird so I raised this here |
I'm confused. Is the OpenTelemetry JavaScript implementation doing something that is not standard for OpenTelemetry? If so, it seems like the OpenTelemetry implementation should be changed. If the OpenTelemetry JavaScript implementation is following the OpenTelemetry spec, we should expect that, for example, a Python implementation of OpenTelemetry would also cause signing issues for the Python SDK (boto3) |
I'm always confused 😆 The reason I opened this here is that I think the unexpected behaviour is coming from the aws-sdk rather than the instrumentation-http package. The instrumentation-http package passes the traceparent to all downstream HTTP endpoints. There are conversations about being able to allow/deny propagation for specific endpoints open-telemetry/opentelemetry-js#1698 but I don't think anyone is actively working on this. This would still lead to friction for new open telemetry users wanting to instrument aws-sdk requests. When the request retries at the moment the signature is not recalculated for subsequent HTTP requests even though some of the headers are now different. I think this is because of the signature caching logic. Does boto cache the signature? I don't see this in the aws-sdk-js v3 which might be why we don't see the problem there edit: @benkehoe I think boto3 doesn't cache the token so I would guess when the request is retried a new signature is created |
ah, a signature caching issue makes sense |
Hi @Ankcorn, Thanks for raising the issue. Aside from botocore did you notice this issue with other SDKs? While we look into adding the
Thanks, |
Hey @RanVaknin Thanks for the workaround, that helps a lot :) I have confirmed that it works. I have taken a good look at botocore and the aws-sdk-js v3. It helped me realise here that the root cause for the js v2 issue was not the headers but the caching of the signature, which we only noticed because of the traceheader changinf. The js v2 SDK seems to be the only one that does this. Hope that update helps :) |
This PR seems to fix the issue: open-telemetry/opentelemetry-js#4346 |
Describe the bug
Hey, we ran into this weird issue with aws-sdk request retries failing when instrumented with open telemetry.
The error we spotted was
open-telemetry/opentelemetry-js#3922
I recreated the error in our public sandbox so its clear what went wrong https://sandbox.baselime.io/baselime/sandbox/datasets/otel/traces/dc4e6324537d09513348ab781fb318ff
The traceparent header is updated for the retried http request but the x-amz-security-token is not. This causes requests to fail.
Expected Behavior
Would expect the the request to not error
Current Behavior
Reproduction Steps
When using @opentelemetry/instrumentation-http version 0.35.1 or higher, whenever aws-sdk v2.x.x retries a request (e.g.: rate limit or timeout happens), the request fails, because aws-sdk adds the traceparent to the SignedHeaders, and as a result AWS throws a InvalidSignatureException 400 back.
Possible Solution
Ignore traceparent header like the x-amz-trace-id is
aws-sdk-js/lib/signers/v4.js
Line 198 in 7d80de6
Additional Information/Context
No response
SDK version used
2.1374.0
Environment details (OS name and version, etc.)
AWS Lambda, node 16
The text was updated successfully, but these errors were encountered: