-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL query string values should be redacted by default #961
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but it would also be great to update examples (or add new REDACTED ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Left a nit suggestion about using the word SHOULD for instrumentations to provide option to not redact.
# your pull request title with [chore] or use the "Skip Changelog" label. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: bug_fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognize that the specification allows this as a non-breaking change because it is an attribute value, but I am very concerned about the end user experience of this change. This change will break alerts/slos/boards that expect query parameters to be present in the attribute named url.full
or url.query
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially painful for users who are treating the query parameters as non-sensitive data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in practice this can be a pretty substantial breaking change for end users, most of whom likely don't have sensitive information in these query params
I would be more comfortable with keeping the current language and providing an opt-in redaction processor so as not to break existing users. |
I think it's fair to leave the registry docs the same as they were and put the opt-out redaction as an augmented description for http semconv. |
Is there any chance that you provide environmental variable name as an option to disable redaction? Without this I expect couple of different implementations in various languages. |
docs/url/url.md
Outdated
**[3]:** Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). | ||
Instrumentation MAY provide a configuration option to capture the full query string without any redaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make specific suggestions, I'd prefer this state that query string values MAY be redacted instead of SHOULD and going as far as stating that Instrumentation MUST
provide a way to redact query string.
Based on this point from @Kielek a broader proposal: we will run into questions around sensitivity again and again, and right now it is solved selectively via inline comments in the specification (there are other ones for enduser.* and db.* properties), while this needs a broader solution. I suggest that such a solution could be designed in a way that it is decoupled from the attribute requirements and by that "solves" this problem. Here is a rough outline:
I anticipate that this comes with a set of downsides (SDKs need to implement that filtering,redaction,etc., / those processes may be resource incentive for a busy node / ...), but this is something that in my experience end-user will ask for eventually anyhow (at least from my indirect experience with APM agents). |
It is important for native instrumentations to redact sensitive data or make users opt into the collection. For example, when writing logs, OTel is just one of possible logging providers and there are no guarantees secrets (if any) will be scrubbed by something in the pipeline. I can see the world where:
This allows instrumentations to protect themselves by having safe default (if my Azure SDK instrumentation leaked a credential, I'm responsible, not the OTel). If the distro has user permission or is brave enough, it can enable collection for all instrumentations. |
I disagree: instrumentation libraries and native instrumentations should not carry the responsibility to redact sensitive data or provide users to opt into the collection from an OpenTelemetry perspective. If they use other logging providers or if they provide APIs to other tracing/metric providers, they can (and may be somtimes have to) do their own redaction, but that's not what we (OpenTelemetry) should expect them to do. I argue that filtering and redaction should be a responsibility of the otel SDK and in no way the instrumentation library or the API should be expected to do the filtering:
To double down on this: I think that sampling and redaction share a lot of requirements, and therefore redaction should live in the Tracing/Logging/Metrics/Profiling SDK. There may be different strategies for redaction as well and there is room for innovation (see probabilistic sampling) and with the development of technology (and here: changes of legislation) requirements may change independent of the semantic conventions and specification |
@svrnm are you saying that database queries should also not be redacted by default? also, this PR is about security (leaking credentials) and not PII data, so I believe it applies to both production and non-production environments |
What I am saying is that the approach we take for sensitive data redaction needs to be approached differently. Right now we have places in the spec that say "redact this data" or "do not collect this data", but no guidance on where and how this data should be treated accordingly. Right now it seems that this is a responsibility of the instrumenting library, so each HTTP instrumentation library will need to manage the redaction of
I understand, but at the end it is about both. I also understand that this issue here is urgent and might need to be merged regardless to mitigate the security bug, but it it's still not clear then who is responsible for complying with this default? |
Yes, this. The SDK needs to build this in, and we should make a best effort to redact sensitive strings (we should probably have a way for instrumentations to communicate to the SDK what is sensitive, as well), but we cannot simply make a breaking change in one of the most widely used instrumentation paths like this, especially given that I think the majority of the values passed in through query strings in the world today are not sensitive. I would suggest a staged rollout for these profiles as well; Perhaps initially we print a warning on initialization that query strings are passed without redaction and in the future this will change? Stage 2 would be an opt-in redaction processor, Stage 3 is redaction by default? |
Some rough thinking:
|
This could also be a dangerous direction - for example, the pattern keeps changing when it comes "what looks like a credential leak", doing it inside the SDK has serviceability challenges (requiring the service to update dependency and redeploy). Either users don't use it, or they use it and have to do frequent patches in order to keep up with the latest rules, or we'll have to invent a system where these rules can be updated dynamically via some configuration. |
Leaving the defaults and responsibilities aside for a moment. I would be interested to learn from vendors/instrumentation authors on
My list of anecdotal evidence:
Random links from the CNCF blog
|
I mean, it could be a regular expression (or even some simple heurestics -- the provided examples of Azure/GCP/AWS calls with tokens in query strings, for example). I would point out that Datadog offers two options - redact query string, and redact paths with digits (https://docs.datadoghq.com/tracing/configure_data_security/?tab=http#trace-obfuscation) but they are both disabled by default. In addition, as other comments have pointed out, we ship a system for redaction today, it's in the collector, and our official guidance on production deploys is to use a collector. I am not arguing that we shouldn't have a redaction option, I am arguing that we should not default it to 'on'. I would point out that this proposed new behavior was opposed by comments from the community on the .NET repo where it originated (open-telemetry/opentelemetry-dotnet#5532 (comment)). edit: another example from the datadog docs, specifically around their library: https://docs.datadoghq.com/tracing/configure_data_security/?tab=http#library. they redact 'suspicious-looking' values through a regex, the default of which is below:
|
My experience in the past ~3 years in my capacity working for a vendor (Honeycomb) has been a mix:
|
I would also add that in five years at Lightstep I don't recall anyone having an issue with this. Any time we did have a customer send PII or other sensitive data, we had ways for them to delete it, and we offered redaction as a part of our collection pipeline for known sensitive keys/strings. |
I'd love to be able to do this. Could we all agree with something like:
? |
I would support this. |
I've sent a PR to motivate further discussion of this proposal: #971 |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Commenting to unstale |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Fixes #860
Changes
Query string values are now redacted by default due to concerns around leaking sensitive data.
This is not considered a breaking change because the OTel semantic conventions definition of stability specifically allows changing attribute values: