Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header signing error affecting opentelemetry instrumentation #4472

Closed
Ankcorn opened this issue Jul 27, 2023 · 8 comments · May be fixed by #4473
Closed

Header signing error affecting opentelemetry instrumentation #4472

Ankcorn opened this issue Jul 27, 2023 · 8 comments · May be fixed by #4473
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue workaround-available

Comments

@Ankcorn
Copy link

Ankcorn commented Jul 27, 2023

Describe the bug

Hey, we ran into this weird issue with aws-sdk request retries failing when instrumented with open telemetry.

The error we spotted was

The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

open-telemetry/opentelemetry-js#3922

I recreated the error in our public sandbox so its clear what went wrong https://sandbox.baselime.io/baselime/sandbox/datasets/otel/traces/dc4e6324537d09513348ab781fb318ff

The traceparent header is updated for the retried http request but the x-amz-security-token is not. This causes requests to fail.

Expected Behavior

Would expect the the request to not error

Current Behavior

The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

Reproduction Steps

When using @opentelemetry/instrumentation-http version 0.35.1 or higher, whenever aws-sdk v2.x.x retries a request (e.g.: rate limit or timeout happens), the request fails, because aws-sdk adds the traceparent to the SignedHeaders, and as a result AWS throws a InvalidSignatureException 400 back.

Possible Solution

Ignore traceparent header like the x-amz-trace-id is

'x-amzn-trace-id'

Additional Information/Context

No response

SDK version used

2.1374.0

Environment details (OS name and version, etc.)

AWS Lambda, node 16

@Ankcorn Ankcorn added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 27, 2023
@Ankcorn Ankcorn changed the title Heade signing error effecting opentelemetry instrumentation Header signing error effecting opentelemetry instrumentation Jul 27, 2023
@Ankcorn Ankcorn changed the title Header signing error effecting opentelemetry instrumentation Header signing error affecting opentelemetry instrumentation Jul 27, 2023
@benkehoe
Copy link

benkehoe commented Jul 27, 2023

Interesting; the list of headers to exclude from signing in JSv2:

unsignableHeaders: [
'authorization',
'content-type',
'content-length',
'user-agent',
expiresHeader,
'expect',
'x-amzn-trace-id'
],

is significantly longer than in botocore:
https://github.com/boto/botocore/blob/a9397847a3150eb619f3457e14557afd12ff13ab/botocore/auth.py#L61-L65
open telemetry headers aren't in there either.

Maybe this issue belongs in the cross-SDK repo?

@Ankcorn
Copy link
Author

Ankcorn commented Jul 27, 2023

Thanks @benkehoe

It is surprising that they differ significantly. I think for now this issue only crops up with the js client because the javascript https://github.com/open-telemetry/opentelemetry-js/tree/main/experimental/packages/opentelemetry-instrumentation-http adds the traceparent header to every request.

Adding an AWS specific code path to the HTTP instrumentation felt weird so I raised this here

@benkehoe
Copy link

I'm confused. Is the OpenTelemetry JavaScript implementation doing something that is not standard for OpenTelemetry? If so, it seems like the OpenTelemetry implementation should be changed. If the OpenTelemetry JavaScript implementation is following the OpenTelemetry spec, we should expect that, for example, a Python implementation of OpenTelemetry would also cause signing issues for the Python SDK (boto3)

@Ankcorn
Copy link
Author

Ankcorn commented Jul 28, 2023

I'm always confused 😆

The reason I opened this here is that I think the unexpected behaviour is coming from the aws-sdk rather than the instrumentation-http package.

The instrumentation-http package passes the traceparent to all downstream HTTP endpoints. There are conversations about being able to allow/deny propagation for specific endpoints open-telemetry/opentelemetry-js#1698 but I don't think anyone is actively working on this. This would still lead to friction for new open telemetry users wanting to instrument aws-sdk requests.

When the request retries at the moment the signature is not recalculated for subsequent HTTP requests even though some of the headers are now different. I think this is because of the signature caching logic. Does boto cache the signature? I don't see this in the aws-sdk-js v3 which might be why we don't see the problem there

edit:

@benkehoe I think boto3 doesn't cache the token so I would guess when the request is retried a new signature is created
https://github.com/boto/botocore/blob/1fad151309322a88641571ac188af8e68ba56a59/botocore/auth.py#L444C23-L444C23

@benkehoe
Copy link

ah, a signature caching issue makes sense

@RanVaknin
Copy link
Contributor

Hi @Ankcorn,

Thanks for raising the issue. Aside from botocore did you notice this issue with other SDKs?
Newer SDKs have a more uniform list of unsignable headers as they all should follow the same type of internal RFC like paper. Older SDKs implementations might differ.

While we look into adding the traceparent header into the the list, you can workaround it by doing:

AWS.Signers.V4.prototype.unsignableHeaders.push('traceparent');

Thanks,
Ran~

@RanVaknin RanVaknin self-assigned this Aug 1, 2023
@RanVaknin RanVaknin added needs-review This issue/pr needs review from an internal developer. workaround-available p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Aug 1, 2023
@Ankcorn
Copy link
Author

Ankcorn commented Aug 2, 2023

Hey @RanVaknin

Thanks for the workaround, that helps a lot :) I have confirmed that it works.

I have taken a good look at botocore and the aws-sdk-js v3.

It helped me realise here that the root cause for the js v2 issue was not the headers but the caching of the signature, which we only noticed because of the traceheader changinf. The js v2 SDK seems to be the only one that does this.

Hope that update helps :)

@mugli
Copy link

mugli commented Feb 1, 2024

This PR seems to fix the issue: open-telemetry/opentelemetry-js#4346
which is released in @opentelemetry/instrumentation-http v0.46.0: open-telemetry/opentelemetry-js#4358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue workaround-available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants