X-Opaque-ID contains UUID causing ES deduplication to fail #120124

pgomulka · 2021-12-01T16:04:52Z

ES is originally using X-Opaque-ID to trace origin of the request, to help identify slow search request or the use of deprecated functionality.
There was an expectation that this value would contain something in a form of user-id. No specification was given on how this id will look like. We documented this in both slow log and deprecation use cases.

The value of the X-Opaque-ID is used when de-duplicating deprecated log messages (together with a key of the deprecated feature). That means if the X-Opaque-ID is always unique, then there will be no de-duplication of those messages.
This will cause deprecation log to grow and since we enabled by default deprecation log indexing, it might use significant resources of the cluster.

The problem:
Kibana is using the UUID in a form of 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' for the X-Opaque-ID to help diagnose conceptual origin of the HTTP request. From what I understand, it can also pass the value provided by a user on a request to Kibana?
But because the value is mostly unique, it generates a lot of duplicated deprecation logs.

I wonder if we could use trace-id for UUID and use something like a UserID for the X-Opaque-Id

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-01T16:52:42Z

Pinging @elastic/kibana-core (Team:Core)

pgomulka · 2021-12-01T17:17:21Z

we also discussed this on a core-infra sync today.
We realised that maybe two separate use cases are mixed here. The tracing of the request across the stack and the identification of the origin of the request.
For the tracing we would prefer traceparent/traceid be used. We need to expand the documentation on this. Here is the PR that added the support for this header https://github.com/elastic/elasticsearch/pull/74210/files
The traceparent can have a generated UUIDs (following the defined syntax)

For origin of the slow request/depreacated feature request we would prefer if X-Opaque-ID contain an id from a finite set so that we could deduplicate deprecatation logs. As we documented the intended use case is to set a user-id (no format enforced) so that an app/user could be found by a user.
For Kibana, maybe an X-Opaque-ID could be something like a call-site. A class name, method? and for a traceparent a UUID could be used?

droberts195 · 2021-12-01T17:41:50Z

For Kibana, maybe an X-Opaque-ID could be something like a call-site. A class name, method?

Call-site would be useful, but in some cases it would be nice to also know the object that made the call.

Recently we have had a number of support cases where something in Kibana has been submitting a search to Elasticsearch that uses enormous resources, for example SIEM detection rules. In this case it would be nice to not just find out that the call site is "detection rules", but to find out the name of the particular rule that was involved.

I think this change would be in keeping with the original intent of this issue. For both deprecation log de-duplication and identification of the source of heavy searches it's useless to have a UUID for each search done by a particular SIEM detection rule. And in both cases it's useful to know not just that the source of the search was a SIEM detection rule, but which one.

Maybe simply call-site could be used most of the time, with call-site + object name to be used in a select few cases where we anticipate needing to identify the object.

lukeelmers · 2021-12-01T19:55:52Z

For origin of the slow request/depreacated feature request we would prefer if X-Opaque-ID contain an id from a finite set so that we could deduplicate deprecate log

I completely understand the need to deduplicate the deprecation logs and the challenges that unique opaque IDs cause, but TBH I'm a little confused on this -- don't we call the ID "opaque" exactly because Elasticsearch shouldn't be making any assumptions about its contents?

What makes this challenging for Kibana is that we are doing more with x-opaque-id than just adding request uuid. We now have an execution context service which will be enabled by default starting in 8.1. The service allows plugins to create a context object that is propagated to ES via the x-opaque-id header.

This means that x-opaque-id can now look something like this (quoting an example from #119932):

Then, in Elasticsearch slow logs you might find a hint that a request was initiated by my-plugin-ui-feature feature of my-app application and proxied through my-plugin-server-feature features of my-app-server app:
x-opaque-id: ${requestId};kibana:my-app:my-plugin-ui-feature:42;my-app-server:my-plugin-server-feature:42

All this is to say, if we wanted to change the value Kibana passes via x-opaque-id, it would also affect our implementation of the execution context service. But I'll defer to @mshustov for his thoughts on that.

rudolf · 2021-12-02T10:31:48Z

I think it's worth discussing what we want to do long term. But we need to find a solution for 7.16/7.17

I think we have the following options:

Remove x-opaque-id from Kibana. Either without any replacement tracing header or by switching to the traceparent header. The Kibana Core can explore the effort and any potential problems that might come out of this solution.
When receiving the x-elastic-product-origin: 'kibana' header, Elasticsearch ignores the x-opaque-id header when deduplicating logs. Would this be a viable alternative?

mshustov · 2021-12-02T10:48:13Z

There was an expectation that this value would contain something in a form of user-id. No specification was given on how this id will look like. We documented this in both slow log and deprecation use cases.

Since this format's been documented for a while, I agree that it's a bug. In #71019 Kibana didn't require x-opaque-id to be a user unique identifier and as it's noticed Kibana falls back to a random uuid value if x-opaque-id is not provided.

We now have an execution context service which will be enabled by default starting in 8.1. The service allows plugins to create a context object that is propagated to ES via the x-opaque-id header.
This means that x-opaque-id can now look something like this

In the long term, we are going to migrate from (mis-)used x-opaque-id header to semantically correct baggage header #101708. The draft status of the baggage header specification is the main blocker for this effort. If it's a critical problem we can consider using another header for execution context propagation.

Remove x-opaque-id from Kibana.

We can't just remove x-opaque-id support as it will be a breaking change. Kibana already documents server.requestId settings https://www.elastic.co/guide/en/kibana/7.16/settings.html However, there are no requirements for x-opaque-id content.

What I'd propose is:

document requirements to x-opaque-id content
stop generating uuid in Kibana if x-opaque-id is not provided
decide what information Kibana should provide via x-opaque-id header.
@droberts195 suggested Call-site would be useful, but in some cases it would be nice to also know the object that made the call. . We can provide a static kibana value if execution_context is disabled, or a more specific kibana enitity value if execution_context is enabled. execution_context can hint whether a request is originated from a Kibana visualization or Kibana background task/alert. In either way, we get rid of uuid value in x-opaque-id header which should stabilize the deprecation logging. WDYT? @droberts195 @pgomulka

As a side note: we need to document x-opaque-id content format requirement for other Elastic products. For example, other elasticsearch-js client doesn't document this requirement either https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/observability.html#_x_opaque_id_support

pgomulka · 2021-12-03T11:32:10Z

document requirements to x-opaque-id content
stop generating uuid in Kibana if x-opaque-id is not provided
decide what information Kibana should provide via x-opaque-id header.
@droberts195 suggested Call-site would be useful, but in some cases it would be nice to also know the object that made the call. . We can provide a static kibana value if execution_context is disabled, or a more specific kibana enitity value if execution_context is enabled. execution_context can hint whether a request is originated from a Kibana visualization or Kibana background task/alert. In either way, we get rid of uuid value in x-opaque-id header which should stabilize the deprecation logging. WDYT?

this sounds great to me, because this is exactly what ES would expect on that header at the moment.
I will work on improving the documentation for x-opaque-id and traceparent/trace.id fields

droberts195 · 2021-12-06T10:49:00Z

We can provide a static kibana value if execution_context is disabled, or a more specific kibana enitity value if execution_context is enabled.

I guess this relates to #101587 right? (Sorry, I don't know much about prior work in this area.)

In that case I think what I was asking for is covered, but an important point is that all the sub-teams whose Kibana apps might submit heavy searches to Elasticsearch actually take advantage of the execution_context functionality. It probably needs a publicity drive when it's ready for use to ensure that all the people who work on Kibana apps for security, APM, alerts, ML, etc, are all aware that heavy searches should be submitted with an execution_context that clearly identifies where they came from.

Anyway, I've subscribed to that other issue and I don't think it needs covering any more on this issue.

pgayvallet · 2022-01-12T15:45:48Z

Some technical pointers for the implementation

The requestId is set to the executionContext in the HTTP server, here:

kibana/src/core/server/http/http_server.ts

Lines 338 to 343 in 755950a

    
           const requestId = getRequestId(request, config.requestId); 
        
           const parentContext = executionContext?.getParentContextFrom(request.headers); 
        
           if (parentContext) executionContext?.set(parentContext); 
        
           executionContext?.setRequestId(requestId);

We'll probably have to modify getRequestId to return undefined instead of a random uuid when the opaqueId can't be retrieved from the request (either not present or because of the configuration)

This ID is also accessible from the KibanaRequest instance, via the id property

kibana/src/core/server/http/router/request.ts

Lines 130 to 138 in 338fe1a

    
             /** 
        
              * A identifier to identify this request. 
        
              * 
        
              * @remarks 
        
              * Depending on the user's configuration, this value may be sourced from the 
        
              * incoming request's `X-Opaque-Id` header which is not guaranteed to be unique 
        
              * per request. 
        
              */ 
        
             public readonly id: string;

This is then used to set the header of the request against ES in two places:

In the cluster client

kibana/src/core/server/elasticsearch/client/cluster_client.ts

Lines 97 to 99 in 3c8fa52

    
           if (isRealRequest(request)) { 
        
             const requestHeaders = ensureRawRequest(request).headers; 
        
             const requestIdHeaders = isKibanaRequest(request) ? { 'x-opaque-id': request.id } : {};

This is used to set the headers parameter when instantiating a child Client instance, in ClusterClient#asScoped.

In our custom transport

kibana/src/core/server/elasticsearch/client/configure_client.ts

Lines 41 to 45 in 1a6a9ae

    
           const opaqueId = getExecutionContext(); 
        
           if (opaqueId && !opts.opaqueId) { 
        
             // rewrites headers['x-opaque-id'] if it presents 
        
             opts.opaqueId = opaqueId; 
        
           }

which is retrieving the value from the ExecutionContextService:

kibana/src/core/server/elasticsearch/elasticsearch_service.ts

Line 161 in d2bd7f8

() => this.executionContextClient?.getAsHeader()

kibana/src/core/server/execution_context/execution_context_service.ts

Lines 155 to 163 in 4681a80

    
           private getAsHeader(): string | undefined { 
        
             if (!this.enabled) return; 
        
             // requestId may not be present in the case of FakeRequest 
        
             const requestId = this.requestIdStore.getStore()?.requestId ?? 'unknownId'; 
        
             const executionContext = this.contextStore.getStore()?.toString(); 
        
             const executionContextStr = executionContext ? `;kibana:${executionContext}` : ''; 
        
             return `${requestId}${executionContextStr}`; 
        
           }

As the usage from the ClusterClient is directly accessing the request, we'll likely need a way to differentiate a request having an ID retrieved from the x-opaque-id header coming from the client from a request that have a generated ID.

From a quick grep, I couldn't really find any usage of KibanaRequest.id apart from the ones used in the ES service's code.

We will probably need to either make this KibanaRequest.id value optional, and only populate it when coming from a proper x-opaque-id header, or add a boolean flag to identify if the value was autogenerated or not.

It also makes me wonder why exactly we decided to set a default value to id, given we also have KibanaRequest.uuid which is supposed to be the real technical identifier of the object.

I would personally make KibanaRequest.id optional AND rename it to opaqueId, to reflect more precisely what exactly this value is about.

Then, we'll also need to modify ExecutionContextService.getAsHeader to better handle the case where the requestId / opaqueId is undefined (it actually use a unknownId constant in that case, and we can probably do slightly better now that undefined will be a 'normal' value)

@lukeelmers @mshustov you probably have a better understanding of the whole problematic than I do. WDYT?

mshustov · 2022-01-13T19:13:40Z

we'll likely need a way to differentiate a request having an ID retrieved from the x-opaque-id header coming from the client from a request that have a generated ID.

Why is it necessary if we aren't going to generate request id anymore? This value from ClusterClient will be overwritten by getExecutionContext, I'm not even sure we need this logic in getScopedHeaders.

It also makes me wonder why exactly we decided to set a default value to id, given we also have KibanaRequest.uuid which is supposed to be the real technical identifier of the object.

I suppose it's because an end-user can set any x-opaque-id value, so we can't rely on the format of the value.

I would personally make KibanaRequest.id optional AND rename it to opaqueId, to reflect more precisely what exactly this value is about.

We can do it since elasticsearch-js uses this naming as well. And there are no other consumers in the Kibana codebase anyway.

Then, we'll also need to modify ExecutionContextService.getAsHeader to better handle the case where the requestId / opaqueId is undefined (it actually use a unknownId constant in that case, and we can probably do slightly better now that undefined will be a 'normal' value)

Not sure we can drop it completely, but we can use something more compact, like -. Note, that in the long term we will move execution_context from x-opaque-id to baggage header with proper semantic value. We are blocked by elastic/apm-agent-nodejs#2102

pgayvallet · 2022-01-17T08:28:44Z

This value from ClusterClient will be overwritten by getExecutionContext, I'm not even sure we need this logic in getScopedHeaders.

if execution_context.enabled is false, getAsHeader returns undefined,

kibana/src/core/server/execution_context/execution_context_service.ts

Lines 155 to 156 in 4681a80

    
           private getAsHeader(): string | undefined { 
        
             if (!this.enabled) return;

which makes the ES client use the header value specified by getScopedHeaders.

If we modify the execution context service to return a static value when it's disabled, then we wouldn't need to update the logic in getScopedHeaders. but in that case, if we accept that the x-opaque-id header that will be sent to ES is always managed by the EC service, shouldn't we remove the x-opaque-id handling from ClusterClient#getScopedHeaders altogether?

lukeelmers · 2022-01-19T21:39:06Z

I don't think having these logs that are anyway hidden by upgrade assistant adds much value, so I think we could consider not creating deprecation logs for elastic stack components at all. I'm not sure what implications made us decide against this approach initially.

I tend to agree, it feels like they aren't all that useful, but there could be some context I am missing. @jakelandis & team, what do y'all think?

jakelandis · 2022-01-19T22:30:50Z

What do you have in mind with "hiding" them? I don't think having these logs that are anyway hidden by upgrade assistant adds much value, so I think we could consider not creating deprecation logs for elastic stack components at all. I'm not sure what implications made us decide against this approach initially.

Apologies... I was mis-understanding how x-elastic-product-origin was implemented. I see now that Kibana does indeed "hide" these, but only after they have been written to the index and hidden only via the UI. My intent is not re-open those decisions and hiding via the UI (mostly) satisfies my concerns around the 60 depredations expected to be shipped (~~and will voice those lesser concerns via email~~ edit submitted elastic/elasticsearch#82833 to handle unbounded growth of the log).

...back on topic ...

Change the Elasticsearch deduplication logic to ignore x-opaque-id when performing deduplication only when the x-elastic-product-origin: kibana header is present.

If x-elastic-product-origin from Kibana is always hidden from the users view, then why only ignore x-opaque-id for duplication. Wouldn't a simpler option be skip logging the deprecations entirely ? Wouldn't that have the same effect per the user's perspective ? (again, not trying to reopen old decisions, just trying to understand).

pgomulka · 2022-01-20T09:24:46Z

If x-elastic-product-origin from Kibana is always hidden from the users view, then why only ignore x-opaque-id for duplication. Wouldn't a simpler option be skip logging the deprecations entirely ? Wouldn't that have the same effect per the user's perspective ? (again, not trying to reopen old decisions, just trying to understand).

I don't think we considered skipping logging the deprecations originating from elastic entirely last time (when the issue was originally created). I personally would be worried about this idea. How else would we track what deprecated features are still used within the stack (kibana for instance?) We could obviously push that responsibility on the clients to store the warning headers with a deprecations. It sound to me, like this would mean clients (like kibana) would have to effectively implement and maintain their own deprecation log.

back on the issue and the discussion summary.
I understand that fixing 60deprecations in short timeframe is not possible
and option 1 is also less favourable because it requires more changes.
I am fine with option 2 (elasticsearch not using x-opaque-id for throttling deprecations from kibana only) if the 7.17 timing is the most important.

However, I don't think we considered pushing back with releasing 7.17 and 8.0. Would that even be possible to get more time and implement option 1. If I understand it right, the only downside of this option is the short deadline for 7.17. I don't think traceparent could affect cloud proxy. We already support traceparent in elasticsearch.
The only missing bits for option 1 are:

in ES is traceparent in audit logs which we estimate as low effort
the documentation changes in kibana (regarding correlations of ES-kibana audit logs).
and kibana not using generated UUID in x-opaque-id (already in progress for 7.17)

@igore-kupczynski Just to confirm. Do you think there could be any problem with cloud proxy if kibana start sending traceparent to elasticsearch?

pgayvallet · 2022-01-20T10:03:05Z

However, I don't think we considered pushing back with releasing 7.17 and 8.0. Would that even be possible to get more time and implement option 1. If I understand it right, the only downside of this option is the short deadline for 7.17.

Unfortunately it's not. There are some significant technical implications for option 1 that we discovered yesterday while discussing with @mshustov and @jportner

During the investigations of #123197, we discovered a few things that would be problematic if we'd switch to using the traceId value from the traceparent header for audit logging correlation:

At the moment, Kibana is already sending a traceparent header to elasticsearch, then, as your said, this traceparent header is used by ES to retrieve the traceId and other tracing values from Kibana since elastic/elasticsearch#74210

However, this header is sent by APM, meaning that this is done only if APM is enabled on Kibana. When APM is disabled, there's no traceparent header circulating.

Initially the traceparent support was supposed to be ONLY for log correlation. As Audit Logging was using the x-opaque-id header that was (is atm) always populated with a unique value, having traceparent be optional depending on the configuration was not an issue. If we switch AL's implementation to use traceId instead of xOpaqueId, this header being optional becomes a problem.

We can't really just change our ClusterClient implementation to send a traceparent header, as this would (potentially?) conflict with the one APM is sending. We can't also just let APM handle this, as when APM is disabled, this header would not be sent, therefor breaking the whole audit logging correlation. Our only options would be to either manually generate / send this header when APM is disabled, or (if that's possible) disable APM from handling this header, and always generating it manually to make sure that it is send regardless of Kibana's configuration. Both are non-trivial.

Note that we also got a divergence between 8.1 and older versions, as #118466, which is doing some work in this area, was not backported to either 8.0 or 7.17 (as it was not covering a bug, but a feature), which would also increase the complexity of doing works in the 7.17 branch (even it's that not blocking, just pointing this out).

and kibana not using generated UUID in x-opaque-id (already in progress for 7.17)

This PR is on hold until we agree on a technical solution as it currently breaks audit logging.

rudolf · 2022-01-20T10:18:44Z

Before we dismiss dropping Kibana deprecation logs completely as an option. Kibana is already relying only on deprecation headers for development/testing. When instrumenting CI to detect deprecations we needed a way to trace deprecations back to the calling site and this was impossible using the logs. Looking for the returned header meant we could find out the exact query and stack trace where the query comes from and log all the context. The only shortcoming is that the deprecation headers don't include a level such as WARNING/CRITICAL.

pgayvallet · 2022-01-20T11:36:50Z

@jportner given that the default configuration for the APM agent on Kibana is enabled: true, contextPropagationOnly: true, therefor correctly sending the traceparent header, do you think it would be acceptable to document that audit logging correlation (which would now be based on trace-id) would require Kibana to have APM context propagation enabled to function (meaning that APM's context propagation should not be explicitly disabled by the customer)?

If this is acceptable, it would allow us to always delegate the traceparent handling to APM, therefor removing most of the complexity described in #120124 (comment). In that case, the steps described by @pgomulka in #120124 (comment) would be sufficient, in addition to :

modifying x-pack/plugins/security/server/audit/audit_service.ts to stop manually specifying trace.id to avoid overriding the value already retrieved from APM)
(not sure if necessary) backporting propagate trace.id to http response log manually #118466 or at least the changes to json_layout.ts

Note that if we go that way, we would kinda be forced to keep this enabled: true, contextPropagationOnly: true configuration for APM as the default.

rudolf · 2022-01-20T12:36:34Z

What's the risk of not making APM agent configurable at all? Do users have a good reason to disable it?

pgayvallet · 2022-01-20T12:42:39Z

Do users have a good reason to disable it?

Performances. APM has some impact on performances, even if the contextPropagationOnly mode is supposed to be negligible, given the data collected in #112973.

So no, I don't see any good reason to disable APM totally and not just enable it in contextPropagationOnly mode (which is the default anyway). But maybe @mshustov has a better idea.

What's the risk of not making APM agent configurable at all

Introducing a breaking change in a minor, and a X.last, version. I feel like we may want to avoid that.

mshustov · 2022-01-20T14:08:15Z

given that the default configuration for the APM agent on Kibana is enabled: true, contextPropagationOnly: true, therefor correctly sending the traceparent header, do you think it would be acceptable to document that audit logging correlation (which would now be based on trace-id) would require Kibana to have APM context propagation enabled to function (meaning that APM's context propagation should not be explicitly disabled by the customer)?

Worth noting that contextPropagationOnly support was added in #112973 which is not backported to 8.0, 7.17.

Introducing a breaking change in a minor, and a X.last, version. I feel like we may want to avoid that.

AFAIK We've never documented elastic.apm settings of kibana.yml see #99328, so changing the settings format is not a breaking change.

Do users have a good reason to disable it?

Users may want to disable the APM agent to reduce runtime overhead, the overhead of sending additional tracing headers. @trentm do you see any other potential problems?

jportner · 2022-01-20T15:14:43Z

@jportner given that the default configuration for the APM agent on Kibana is enabled: true, contextPropagationOnly: true, therefor correctly sending the traceparent header, do you think it would be acceptable to document that audit logging correlation (which would now be based on trace-id) would require Kibana to have APM context propagation enabled to function (meaning that APM's context propagation should not be explicitly disabled by the customer)?

If this is acceptable, it would allow us to always delegate the traceparent handling to APM, therefor removing most of the complexity described in #120124 (comment).

Agreed, I think this makes sense, we just need to make sure audit logging docs are updated accordingly.

Note that if we go that way, we would kinda be forced to keep this enabled: true, contextPropagationOnly: true configuration for APM as the default.

Yes.

pgayvallet · 2022-01-20T16:00:19Z

So, to summarize, if we want to go ahead with option 1:

ES side

Add traceparent support on audit logs to include the trace-id field to those logs
- on 8.1, backported to 8.0 and 7.17

Kibana side

Stop generating UUIDs for x-opaque-id (started in No longer use generated IDs for X-Opaque-ID / requestId #123197)
Modify the audit logging to stop overriding trace.id with request.id
Update audit logging documentation
- to specify that the correlation is done via the trace-id field
- to specify that APM agent must be enabled (on contextPropagationOnly mode or not) for correlation to work
Backport Include tracing information in the Kibana logs #112973 to 8.0 and 7.17
Optional but while we're at it, backport propagate trace.id to http response log manually #118466 to 8.0 and 7.17

@pgomulka @jakelandis if we go that way, we gonna need the changes on ES to be available for testing to make sure that audit log correlation is still working after our changes before merging/backporting #123197

@mshustov @jportner @rudolf before we confirm this plan as being an option, does it seem correct to you, maybe I missed anything?

mshustov · 2022-01-20T16:25:34Z

@pgayvallet Kibana's side looks correct. However, this to specify that APM agent must be enabled (on contextPropagationOnly mode or not) for correlation to work looks like a significant change for Kibana. Don't we need Product approval for this? Maybe we should consider the fact Kibana uses the apm agent as an implementation detail and do not allow customers to disable it?

pgayvallet · 2022-01-20T16:30:56Z

Maybe we should consider the fact Kibana uses the apm agent as an implementation detail and do not allow customers to disable it?

Given the minimal impact of the agent enabled in contextPropagationOnly, I think it would be reasonable. Now, that's additional changes to perform (and to backport) for 7.17. Do we think this is a necessity, or are we fine with just documenting audit logs as requiring the agent to be enabled to function?

As a sidenote, I see that @pgomulka started working on elastic/elasticsearch#82855, so if we're going with option 2 for 7.17, we would have more time to properly address option 1 in 8.1, which is why I would really like to have @pgomulka and @jakelandis 's opinion on the direction we want to take.

jportner · 2022-01-20T16:51:36Z

before we confirm this plan as being an option, does it seem correct to you, maybe I missed anything?

Seems correct to me, though I'm unclear if audit logs would be missing the trace.id if we did not backport #118466. If that's the case, I think that we need to backport, it's not optional.

Now, that's additional changes to perform (and to backport) for 7.17. Do we think this is a necessity, or are we fine with just documenting audit logs as requiring the agent to be enabled to function?

I think that it's a necessity. Otherwise someone could have audit logging enabled (with correlation) in 7.16, then upgrade to 7.17 and lose correlation without any warning. We can't realistically expect our users to read through all the docs / release notes to catch something like this, and we don't have the upgrade assistant to surface this info in 7.16.

Another question: are we certain that the APM agent supports contextPropagationOnly in 7.17/8.0? Or is that only supported in 8.1+? The PR where the spec was added is not clear.

pgayvallet · 2022-01-20T17:06:51Z

Otherwise someone could have audit logging enabled (with correlation) in 7.16, then upgrade to 7.17 and lose correlation without any warning

I get that. To be honest my opinion is that given that APM instrumentation isn't even documented anywhere, such risks are null, but I can't argue that this is technically possible.

Now, forcing the agent to be enabled could also be considered a breaking change, because in the same scenario, a user with APM disabled in 7.16 would upgrade to 7.17 to discover that the agent is now enabled. So in that case, I'm unsure we can safely do that in 7.17.

are we certain that the APM agent supports contextPropagationOnly in 7.17/8.0?

Not atm. This is why we need to backport #112973

jakelandis · 2022-01-20T19:14:17Z

Now that I understand that users will never see these deprecation warnings, my initial reservations are dampened.

The issue really comes down to writing more bytes to an index than are expected and/or necessary. Those bytes will likely never be seen by users and are of minimal size (small messages with much repetition). So the problem we are trying to solve is how to avoid trivial levels of overhead writing these messages and minimal bloat in clusters that remain on 7.last for long periods of time.

I think we should solve the problem, but not sure if it is really a blocker to the release.

My initial reservations with the proposed fix :

x-opaque-id is used in an unexpected way, it requires ES to behave specially for Kibana, and it allows Kibana to call into deprecated APIs with fewer consequences

"it allows Kibana to call into deprecated APIs with fewer consequences" - this ship has already sailed and this de-duplication issue should focus on just the de-duplication aspect (sorry if forked the conversation and didn't pick up what was being said)

"x-opaque-id is used in an unexpected way" - There is an argument to be made that ES should not have ntroduced caching semantics to what is intended to be an opaque-id. In this case (and others) we tend to use the x-opaque-id as more of a client-id but it really isn't.

"it requires ES to behave specially for Kibana" - reservation holds

We considered ignoring the x-opaque-id header for all requests with the x-elastic-product-origin header, e.g. requests coming from all Elastic stack components. However, this could make it harder to trace the origin of deprecation logs from stack components which are correctly using the x-opaque-id header.

For me this option (or the like) is back on the table. If we already hide all stack component deprecation for the users, then making it harder to trace the deprecation from stack components is moot point. I would vote for this option... if x-elastic-product-origin use that for caching else use x-opaque-id.

However, @rjernst and @pgomulka are the code owners and have the final say for this part of ES.

trentm · 2022-01-20T19:53:58Z

[@mshustov]

Users may want to disable the APM agent to reduce runtime overhead, the overhead of sending additional tracing headers. @trentm do you see any other potential problems?

No other potential problems that I see. Just performance overhead of the APM agent.

pgomulka · 2022-01-24T09:33:16Z

We considered ignoring the x-opaque-id header for all requests with the x-elastic-product-origin header, e.g. requests coming from all Elastic stack components. However, this could make it harder to trace the origin of deprecation logs from stack components which are correctly using the x-opaque-id header.

For me this option (or the like) is back on the table. If we already hide all stack component deprecation for the users, then making it harder to trace the deprecation from stack components is moot point. I would vote for this option... if x-elastic-product-origin use that for caching else use x-opaque-id.

I like that. This is a variation of option 1 - where x-opaque-id would not be used for kibana's requests. Given that all requests with elastic-product-origin are hidden in upgrade assistant, I don't mind using elastic-product-origin for all values when throttling deprecation logs.
I will update elastic/elasticsearch#82855 accordingly.

At the same time I will merge #82849 emit trace.id into audit logs and update misleading information in audit log doc about no semantic meaning of x-opaque-id (#82868)

cjcenizal · 2022-01-24T18:32:08Z

CC @elastic/platform-deployment-management

pgomulka · 2022-01-25T13:27:10Z

I have merged [7.17] Do no use x-opaque-id for deduplicating elastic originating requests backport (#82855) #83023 into 7.17 should we consider this issue done?

Or maybe we should keep it open until kibana stop sending UUID on x-opaque-id?

I also merged Emit trace.id into audit logs #82849
into 7.17 and updated the audit log documentation.

pgayvallet · 2022-01-25T13:59:07Z

should we consider this issue done? Or maybe we should keep it open until kibana stop sending UUID on x-opaque-id?

We do still need to address this properly on our side on 8.1 or 8.2. But there is a lot of noise on this issue. I will open a new one summarizing all this, then close this one.

In the meantime, @pgomulka, do you know if an issue was opened to track adding trace-id support on the log output/format on the elasticsearch side? We'll still need that to properly address the issue.

pgomulka · 2022-01-25T14:04:32Z

In the meantime, @pgomulka, do you know if an issue was opened to track adding trace-id support on the log output/format on the elasticsearch side? We'll still need that to properly address the issue.

There was no issue created. It was added a while back to ES 7.15 We only missed adding this to audit logs.

pgayvallet · 2022-01-25T14:08:02Z

We only missed adding this to audit logs.

Yea, sorry, specifically for the audit logs was what I meant

pgomulka · 2022-01-25T14:09:06Z

also I did not create an issue. I considered this more like a follow up elastic/elasticsearch#82849

pgayvallet · 2022-01-25T14:13:12Z

Oh, that's already done and merged, perfect then.

pgayvallet · 2022-01-25T16:13:55Z

Closing in favor of #123737

pgomulka added bug Fixes for quality problems that affect the customer experience v8.0.0 v7.10.0 labels Dec 1, 2021

botelastic bot added the needs-team Issues missing a team label label Dec 1, 2021

lukeelmers added Feature:elasticsearch Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Dec 1, 2021

botelastic bot removed the needs-team Issues missing a team label label Dec 1, 2021

exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Dec 7, 2021

exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:medium Medium Level of Effort labels Dec 14, 2021

pgayvallet self-assigned this Jan 18, 2022

pgayvallet added v7.17.0 and removed v7.10.0 labels Jan 18, 2022

pgayvallet mentioned this issue Jan 18, 2022

No longer use generated IDs for X-Opaque-ID / requestId #123197

Closed

2 tasks

This comment has been minimized.

Sign in to view

rudolf added the blocker label Jan 20, 2022

pgayvallet mentioned this issue Jan 25, 2022

Stop Kibana from generating unique X-Opaque-ID values when not received from the client. #123737

Open

pgayvallet closed this as completed Jan 25, 2022

X-Opaque-ID contains UUID causing ES deduplication to fail #120124

X-Opaque-ID contains UUID causing ES deduplication to fail #120124

Comments

pgomulka commented Dec 1, 2021

elasticmachine commented Dec 1, 2021

pgomulka commented Dec 1, 2021 • edited

droberts195 commented Dec 1, 2021

lukeelmers commented Dec 1, 2021

rudolf commented Dec 2, 2021 • edited

mshustov commented Dec 2, 2021 • edited

pgomulka commented Dec 3, 2021

droberts195 commented Dec 6, 2021

pgayvallet commented Jan 12, 2022

mshustov commented Jan 13, 2022 • edited

pgayvallet commented Jan 17, 2022

lukeelmers commented Jan 19, 2022

This comment has been minimized.

jakelandis commented Jan 19, 2022 • edited

pgomulka commented Jan 20, 2022

pgayvallet commented Jan 20, 2022 • edited

rudolf commented Jan 20, 2022

pgayvallet commented Jan 20, 2022 • edited

rudolf commented Jan 20, 2022

pgayvallet commented Jan 20, 2022

mshustov commented Jan 20, 2022

jportner commented Jan 20, 2022

pgayvallet commented Jan 20, 2022 • edited

ES side

Kibana side

mshustov commented Jan 20, 2022

pgayvallet commented Jan 20, 2022

jportner commented Jan 20, 2022 • edited

pgayvallet commented Jan 20, 2022

jakelandis commented Jan 20, 2022

trentm commented Jan 20, 2022

pgomulka commented Jan 24, 2022 • edited

cjcenizal commented Jan 24, 2022

pgomulka commented Jan 25, 2022

pgayvallet commented Jan 25, 2022

pgomulka commented Jan 25, 2022

pgayvallet commented Jan 25, 2022

pgomulka commented Jan 25, 2022

pgayvallet commented Jan 25, 2022

pgayvallet commented Jan 25, 2022

pgomulka commented Dec 1, 2021 •

edited

rudolf commented Dec 2, 2021 •

edited

mshustov commented Dec 2, 2021 •

edited

mshustov commented Jan 13, 2022 •

edited

jakelandis commented Jan 19, 2022 •

edited

pgayvallet commented Jan 20, 2022 •

edited

pgayvallet commented Jan 20, 2022 •

edited

pgayvallet commented Jan 20, 2022 •

edited

jportner commented Jan 20, 2022 •

edited

pgomulka commented Jan 24, 2022 •

edited