Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id #1041

eli-gc · 2023-12-11T20:07:35Z

(check apply)

read the contribution guideline
(optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Problem

I cannot upgrade past 1.15.1 or else I get this error. There is no error in 1.15.0. I did not see any breaking changes in the release notes of fluentd 1.15.1

#0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id

Steps to replicate

Either clone and modify https://gist.github.com/pitr/9a518e840db58f435911

OR

Provide example config and message

Expected Behavior or What you need to ask

fluentd.conf: |
    # set system level configurations
    <system>
      log_level debug
    </system>
    # input plugin that collects metrics from MonitorAgent
    <source>
      @type prometheus_monitor
      <labels>
        host ${hostname}
      </labels>
    </source>

    # input plugin that collects metrics for output plugin
    <source>
      @type prometheus_output_monitor
      <labels>
        host ${hostname}
      </labels>
    </source>

    # Monitors Fluentd with Datadog
    <source>
      @type monitor_agent
      bind 0.0.0.0
      port 24220
    </source>

    # input plugin to concatenate long log messages
    <filter **>
      @id containerd_concat
      @type concat
      key log
      use_first_timestamp true
      partial_key logtag
      partial_value P
      separator ""
    </filter>

    # Ignore fluentd own events
    <match fluent.**>
      @type null
    </match>


    # HTTP input for the liveness and readiness probes
    <source>
      @type http
      port 9880
    </source>

    # Throw the healthcheck to the standard output instead of forwarding it
    <match fluentd.healthcheck>
      @type stdout
    </match>

    <filter **>
      @type parser
      key_name log
      <parse>
        @type multi_format
        <pattern>
          format json
        </pattern>
        <pattern>
          format none
        </pattern>
      </parse>
    </filter>

    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-docker.pos
      ignore_older 1h
      tag kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          # for docker cri
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
          keep_time_key true
        </pattern>
        <pattern>
          # for containerd cri
          # format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
          format /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/
          time_format %Y-%m-%dT%H:%M:%S.%N%:z
          keep_time_key true
        </pattern>
      </parse>
    </source>

    <filter kubernetes.var.log.containers.**.log>
      @type kubernetes_metadata
    </filter>

     # Filter for core namespace
     <filter kubernetes.var.log.containers.**.log>
       @type grep
       <exclude>
         key $.kubernetes.namespace_name
         pattern /^kube.+/
       </exclude>
     </filter>

    # Send the logs to the standard output
    <match **>
      @type elasticsearch
      log_es_400_reason true
      ssl_verify false
      reload_connections false
      reconnect_on_error true
      reload_on_failure true
      request_timeout 300s
      suppress_type_name true
      bulk_message 30
      include_tag_key true
      host "***"
      user "***"
      password "***"
      port 9243
      scheme https
      index_name "logs-aks"
      logstash_format true
      logstash_prefix "logs-aks"
      logstash_dateformat %Y%m%d
      <buffer>
        @type file
        path /opt/bitnami/fluentd/logs/buffers/logs.buffer
        flush_thread_count 1
        flush_interval 5s
        flush_mode interval
        slow_flush_log_threshold 140s
      </buffer>
    </match>

Using Fluentd and ES plugin versions

OS version: Debian 11.4
Bare Metal or within Docker or Kubernetes or others? Kubernetes
Fluentd v0.12 or v0.14/v1.0
- fluentd 1.15.1
ES plugin 3.x.y/2.x.y or 1.x.y
- fluent-plugin-elasticsearch (5.2.3)
ES version (optional)
8.11.1
ES template(s) (optional)

The text was updated successfully, but these errors were encountered:

eli-gc · 2023-12-14T16:43:45Z

I think it's related to my kubernetes metadata labels. I have two labels: app and app.kubernetes.io/name. I believe it is being rejected because one is a text type and the other is a nested object so Elasticsearch doesn't know how to handle it. Was there a change in how types or dots are handled past 1.15.0? I didn't see anything in the change log. It works just by rolling back to 1.15.0 from 1.15.1+ so I know it isn't Elasticsearch version.

cosmo0920 · 2023-12-28T02:41:04Z

I think it's related to my kubernetes metadata labels. I have two labels: app and app.kubernetes.io/name. I believe it is being rejected because one is a text type and the other is a nested object so Elasticsearch doesn't know how to handle it. Was there a change in how types or dots are handled past 1.15.0?

This is what the root cause of this issue. For handling this, you need to install ES template to define the field type.
see: https://www.elastic.co/guide/en/elasticsearch/reference/8.11/mapping.html
see also: https://www.elastic.co/jp/blog/antidote-index-mapping-exceptions-ignore_malformed

danieltaub96 · 2023-12-30T09:58:08Z

Same Issue for me, it's prevent me to upgrade to latest version, it used to work and just stopped

eli-gc · 2024-01-03T23:46:02Z

@cosmo0920 Is ES template a plugin? Or are you saying I need to make a template myself?

cosmo0920 · 2024-01-08T08:13:29Z

You guys need to create and install Elasticsearch mappings by yourself.
Auto mapping sometimes causes mistypes on their handled documents.

eli-gc · 2024-01-08T23:36:10Z

Thanks, I'll give it a shot and report back.

eli-gc · 2024-01-12T00:23:00Z

I wasn't able to get the mapping to work. It says app cannot be changed from text to ObjectMapper.

PUT /mapping-test-index
{
  "mappings": {
    "properties": {
      "app": {
        "type": "text"
      },
      "app.kubernetes.io/name": {
        "type": "text"
      }
    }
  }
}

xdubois · 2024-01-30T09:28:14Z

@eli-gc Did you find a working configuration ?
I've been experiencing the same conflict issue with an "app" string label

eli-gc · 2024-01-30T17:04:04Z

@xdubois I did not. We decided to move away from Fluentd, but you could try adding the de_dot filter manually or possibly use flattened. De_dot got removed from Fluentd which was the root of my issue. Check out these issues for more info:
de_dot removal
elasticsearch#63530

cosmo0920 · 2024-01-31T00:13:44Z

Not sure the one of the solution candidates but Fluentd has dedot filter plugin: https://github.com/lunardial/fluent-plugin-dedot_filter
This should replace dot(.) with a specified character.

xdubois · 2024-02-07T10:34:46Z

Thanks for responses guys
Couldn't make it work with the dedot plugin
We switched to filebeat for the ease of configuration

Rohlik · 2024-04-10T11:59:11Z

I have the same issue with the logging-operator version 1.6.0, which uses FluentD with this plugin. I get error below:

"reason"=>"[1:1018] failed to parse field [kubernetes.labels.app] of type [keyword] in document with id 'piLbx44BBp6m9YwiO00W'. Preview of field's value: '{kubernetes={io/component=controller}}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:988"}

Even with dedot filter it doesn't seem to work.

  <filter **>
    @type dedot
    @id clusterflow:logging:nginx:2
    de_dot_separator _
  </filter>

cosmo0920 · 2024-04-11T05:17:55Z

failed to parse field [kubernetes.labels.app] of type [keyword] in document

This is because the pointed field is not keyword i.e. just a within 256 length text. Depending on the automatic mapping caused this issue.

Rohlik · 2024-04-11T06:20:21Z

@cosmo0920 Well, filebeat, which was used before on our deployment, provides label field as kubernetes.labels.app_kubernetes_io/component with its value controller, however fluent-bit/fluentd based on that log above messed up and the result is kubernetes.labels.app with its value kubernetes={io/component=controller}.
The original format in Kubernetes looks like: app.kubernetes.io/component: controller

Of course, I can tweak mapping on the Elasticsearch side, but that won't solve weird parsing.

cosmo0920 · 2024-04-11T09:31:03Z

Hmm.., it's weird. Just for my curiosity, isn't it solved by using fluent-bit instead of Fluentd with this plugin?

Rohlik · 2024-04-15T08:47:06Z

@cosmo0920 I deleted mapping for my index to allow all data to come in.
And I noticed that the problematic label displayed in Kibana as kubernetes.labels.app.kubernetes.io/component with its value controller.
So, I guess we can work with that and rewrite our mapping.
However, I am still wondering about that log I posted, as the error message is quite confusing because, based on that, the value should look like kubernetes={io/component=controller}.

cosmo0920 · 2024-04-15T08:58:48Z

The error message came from Elasticsearch itself. So, we couldn't display more clearly unfortunately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id #1041

Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id #1041

eli-gc commented Dec 11, 2023 •

edited by cosmo0920

eli-gc commented Dec 14, 2023

cosmo0920 commented Dec 28, 2023

danieltaub96 commented Dec 30, 2023

eli-gc commented Jan 3, 2024

cosmo0920 commented Jan 8, 2024 •

edited

eli-gc commented Jan 8, 2024

eli-gc commented Jan 12, 2024

xdubois commented Jan 30, 2024

eli-gc commented Jan 30, 2024

cosmo0920 commented Jan 31, 2024

xdubois commented Feb 7, 2024 •

edited

Rohlik commented Apr 10, 2024 •

edited

cosmo0920 commented Apr 11, 2024

Rohlik commented Apr 11, 2024 •

edited

cosmo0920 commented Apr 11, 2024

Rohlik commented Apr 15, 2024

cosmo0920 commented Apr 15, 2024

Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id #1041

Rejected by Elasticsearch [error type]: document_parsing_exception [reason]: '[1:660] failed to parse field [kubernetes.labels.app] of type [text] in document with id #1041

Comments

eli-gc commented Dec 11, 2023 • edited by cosmo0920

Problem

Steps to replicate

Expected Behavior or What you need to ask

Using Fluentd and ES plugin versions

eli-gc commented Dec 14, 2023

cosmo0920 commented Dec 28, 2023

danieltaub96 commented Dec 30, 2023

eli-gc commented Jan 3, 2024

cosmo0920 commented Jan 8, 2024 • edited

eli-gc commented Jan 8, 2024

eli-gc commented Jan 12, 2024

xdubois commented Jan 30, 2024

eli-gc commented Jan 30, 2024

cosmo0920 commented Jan 31, 2024

xdubois commented Feb 7, 2024 • edited

Rohlik commented Apr 10, 2024 • edited

cosmo0920 commented Apr 11, 2024

Rohlik commented Apr 11, 2024 • edited

cosmo0920 commented Apr 11, 2024

Rohlik commented Apr 15, 2024

cosmo0920 commented Apr 15, 2024

eli-gc commented Dec 11, 2023 •

edited by cosmo0920

cosmo0920 commented Jan 8, 2024 •

edited

xdubois commented Feb 7, 2024 •

edited

Rohlik commented Apr 10, 2024 •

edited

Rohlik commented Apr 11, 2024 •

edited