Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[out_loki] A lot of warning 'Tenant ID is overwritten A -> B' if tenant_id_key is used #8809

Open
YevhenLodovyi opened this issue May 9, 2024 · 2 comments

Comments

@YevhenLodovyi
Copy link

Hello,

Your Environment

  • Version used: 3.0.2
  • Environment name and version (e.g. Kubernetes? What version?): eks

I am using flb to send logs to loki. I am trying to seperate logs, so I am using multi-tenancy. I have a lua script to generate the tenant_id, so in the output i have:

      [OUTPUT]
          name        loki
          match       kube.*
          host        loki.internal
          port        3100
          tls         on
          tls.verify  off
          tenant_id_key tenant_id
          labels      source=eks,namespace=$kubernetes['namespace_name'],container=$kubernetes['container_name'],app=$kubernetes['labels']['app']
          remove_keys stream,_p,$kubernetes['labels']
          compress    gzip
          Retry_Limit False

As far as I can see the logs are distributed properly, but I have a lot of warning:

│ [2024/05/09 13:19:22] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:23] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:23] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:24] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:24] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:25] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:25] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:26] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:26] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:27] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:27] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:28] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:30] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:31] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:32] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:33] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:34] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:35] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:36] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:37] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:41] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:42] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:43] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:44] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:44] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:45] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra                                                                                                                                                │
│ [2024/05/09 13:19:45] [ warn] [output:loki:loki.0] Tenant ID is overwritten infra -> apps                                                                                                                                                │
│ [2024/05/09 13:19:46] [ warn] [output:loki:loki.0] Tenant ID is overwritten apps -> infra

The warn is defined here: https://github.com/fluent/fluent-bit/blob/master/plugins/out_loki/loki.c#L1152

@zhangzx1996
Copy link

Why you used fluentbit not promtail to send data to Loki, I think promtail will good to use when with Loki

@cm-rudolph
Copy link

cm-rudolph commented May 14, 2024

@YevhenLodovyi, are your log entries separated into different chunks, e. g. by making sure that they get re-emitted with a tag that contains the actual value of the tenant_id? Have a look at #2935 (comment), that describes the issue quite well. It would be helpful if you provided the lua script and the rest of the fluent-bit config.

If your chunks are well aligned:

We are possibly encountering the same issue, where a race condition seems to be involved. I guess it was introduced (or at least not fixed) by PR #6931 where dynamic_tenant_id gets shared within the thread. I guess due to usage of coroutines, the value changes between loki_compose_payload and flb_http_add_header.

But I have to admit that I don't see a context switch in between. Maybe @leonardo-albertovich could shed some light on it?

Though we are quite sure that the value changes in between, as we captured the traffic using tcpdump and the chunks are well aligned and contain log messages from (in our case) distinct namespaces only, but are tagged with the wrong X-Scope-OrgID header.

We have configured Tenant_id_key to customer and the request looks like this:

POST /loki/api/v1/push HTTP/1.1
Host: loki-gateway.logging.svc:80
Content-Length: 944
User-Agent: Fluent-Bit
Content-Type: application/json
X-Scope-OrgID: customer1
Connection: keep-alive

{"streams":[{"stream":{"job":"fluent-bit","node":"worker11","namespace":"customer2-namespace"},"values":[["1715604287815933056","{\"stream\":\"stdout\",\"logtag\":\"F\",\"message\":\"INFO      trace_generator_slow - generate_slow_traces - SlowOperation created in customer2-namespace - Trace ID: 95f6274c96752cf944547abda39e24b1, Span ID: 5eaddc539554b1ec - 13/05/2024 Monday 12:44:47\",\"namespace_name\":\"customer2-namespace\",\"host\":\"worker11\",\"kubernetes\":{\"pod_name\":\"trace-generator-slow-job-swzcs\",\"namespace_name\":\"customer2-namespace\",\"container_name\":\"trace-generator-slow\",\"labels\":{\"batch.kubernetes.io/controller-uid\":\"f935d0ea-6e29-4be8-a379-0126cb2d2b8d\",\"batch.kubernetes.io/job-name\":\"trace-generator-slow-job\",\"controller-uid\":\"f935d0ea-6e29-4be8-a379-0126cb2d2b8d\",\"job-name\":\"trace-generator-slow-job\"}},\"customer\":\"customer2\",\"cluster\":\"cluster-name\"}"]]}]}

If this is a separate topic, I'll file an additional issue.

Update: We recompiled fluent-bit with FLB_OUTPUT_SYNCHRONOUS flag set in out_loki.c. The issue still persists. Either the cause is something else, or we misinterpreted the meaning of the flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants