Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch is rejecting logs with a 400 error mapper_parsing_exception but fluentd is not showing the details of why #1016

Open
danfinn opened this issue Apr 25, 2023 · 3 comments

Comments

@danfinn
Copy link

danfinn commented Apr 25, 2023

(check apply)

  • [ x] read the contribution guideline
  • [ x] (optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Problem

Elasticsearch is rejecting logs with a 400 error and we are unable to get fluentd to show us why. We have set the following:

@log_level trace
log_es_400_reason true

with those set we are still not seeing the reason for the 400. Here is a sample rejection:

2023-04-25 19:10:59 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse'" location=nil tag="kubernetes. var.log.containers.taskservice-t4v-qz2c4_task-9414f12fa0a0ac226643b96df054c2d8e18f6ad63b63389.log" time=2023-04-25 19:10:47.184163300 +0000 record={"stream"=>"stdout", "logtag"=>"F", "message"=>"[2023-04-25T19:10:47.           000Z][LOGMONITOR] INFO: Entrypoint processs exit code: 0", "time"=>"2023-04-25T19:10:47.1841633Z", "docker"=>{"container_id"=>"941425583bae6b61945e4f6643b96df054c2d8e18f6ad63b63389"}, "kubernetes"=>{"container_name"=>"taskserv",                           "namespace_name"=>"o", "pod_name"=>"taskserv-74v-qz2c4", "container_image"=>"registry.com/service:2693", "container_image_id"=>"registry.com/service@sha256:                                                   815ea13ad16c2f81b0182d3bdc466a763fac53cafc20e28e2e454", "pod_id"=>"b7817-4bb0-8575-65bdb0977892", "pod_ip"=>"10.19.248.114", "host"=>"aks00", "labels"=>{"app_kubernetes_io/instance"=>"taskser", "app_kubernetes_io/                      name"=>"skservice", "controller-uid"=>"fb379ac5-292f-8fd1-a765e72f1650", "job-name"=>"service-754v", "scaledjob_sh/name"=>"service"}, "master_url"=>"https://10.0.0.1:443/api", "namespace_id"=>"c6d-224a-4aa8-8dee-239ad39ef437",  "namespace_labels"=>{"azure-key-vault-env-injection"=>"enabled", "kubernetes_io/metadata_name"=>"o", "purpose"=>"o"}}, "@timestamp"=>"2023-04-25T19:10:47.184163300+00:00", "tag"=>"kubernetes.var.log.containers.service-75-qz2c4_-941425583bae6b61945e4f12fa0a0ac226643b96df054c2d8e18f6ad63b63389.log"}

It would make it easier to troubleshoot if we could see why we are getting this 400 mapping error.

When we enable this on our linux based fluentd pods we get log results like the following, that include the error from ES:

2023-04-25 22:03:04 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse field [transaction.response.body] of type [text] in document with id '_f5xuocBnDsL_'. Preview of field's value: '''"

Steps to replicate

run pod, generate logs and watch for 400 errors

...

Using Fluentd and ES plugin versions

This is on kubernetes and on windows nodes
fluent/fluentd:v1.15-windows-ltsc2019-1
fluentd --version
fluentd 1.15.2
gem 'fluent-plugin-elasticsearch' version '5.2.4'

PS C:\fluentd\etc> fluent-gem list

*** LOCAL GEMS ***

abbrev (default: 0.1.0)
addressable (2.8.1)
azure-storage-blob (2.0.3)
azure-storage-common (2.0.4)
base64 (default: 0.1.1)
benchmark (default: 0.2.0)
bigdecimal (default: 3.1.1)
bundler (default: 2.3.7)
certstore_c (0.1.7)
cgi (default: 0.3.1)
concurrent-ruby (1.1.10)
connection_pool (2.3.0)
cool.io (1.7.1)
csv (default: 3.2.2)
date (default: 3.2.2)
debug (1.4.0)
delegate (default: 0.2.0)
did_you_mean (default: 1.6.1)
digest (default: 3.1.0)
domain_name (0.5.20190701)
drb (default: 2.1.0)
elasticsearch (7.15.0)
elasticsearch-api (7.15.0)
elasticsearch-transport (7.15.0)
english (default: 0.7.1)
erb (default: 2.2.3)
error_highlight (default: 0.3.0)
etc (default: 1.3.0)
excon (0.99.0)
faraday (1.10.3)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.0.4)
faraday-net_http (1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
faraday_middleware (1.2.0)
fcntl (default: 1.0.1)
ffi (1.15.5 x64-mingw-ucrt)
ffi-compiler (1.0.1)
ffi-win32-extensions (1.0.4)
fiddle (default: 1.1.0)
fileutils (default: 1.6.0)
find (default: 0.1.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-azure-storage-append-blob (0.2.1)
fluent-plugin-concat (2.5.0)
fluent-plugin-dedot_filter (1.0.0)
fluent-plugin-detect-exceptions (0.0.14)
fluent-plugin-elasticsearch (5.2.4)
fluent-plugin-grok-parser (2.6.2)
fluent-plugin-json-in-json-2 (1.0.2)
fluent-plugin-kubernetes_metadata_filter (3.1.3)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-parser-cri (0.1.1)
fluent-plugin-prometheus (2.0.3)
fluent-plugin-record-modifier (2.1.1)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-windows-eventlog (0.8.3)
fluentd (1.15.2 x64-mingw-ucrt)
forwardable (default: 1.3.2)
getoptlong (default: 0.1.1)
http (5.1.1)
http-accept (1.7.0)
http-cookie (1.0.5)
http-form_data (2.3.0)
http_parser.rb (0.8.0)
io-console (default: 0.5.11)
io-nonblock (default: 0.1.0)
io-wait (default: 0.2.1)
ipaddr (default: 1.2.4)
irb (default: 1.4.1)
json (2.6.2, default: 2.6.1)
jsonpath (1.1.2)
kubeclient (4.11.0)
llhttp-ffi (0.4.0)
logger (default: 1.5.0)
lru_redux (1.1.0)
matrix (0.4.2)
mime-types (3.4.1)
mime-types-data (3.2023.0218.1)
minitest (5.15.0)
msgpack (1.5.6)
multi_json (1.15.0)
multipart-post (2.3.0)
mutex_m (default: 0.1.1)
net-ftp (0.1.3)
net-http (default: 0.2.0)
net-http-persistent (4.0.1)
net-imap (0.2.3)
net-pop (0.1.1)
net-protocol (default: 0.1.2)
net-smtp (0.3.1)
netrc (0.11.0)
nkf (default: 0.1.1)
nokogiri (1.14.2 x64-mingw-ucrt)
observer (default: 0.1.1)
oj (3.13.19)
open-uri (default: 0.2.0)
open3 (default: 0.1.1)
openssl (default: 3.0.0)
optparse (default: 0.2.0)
ostruct (default: 0.5.2)
pathname (default: 0.2.0)
power_assert (2.0.1)
pp (default: 0.3.0)
prettyprint (default: 0.1.1)
prime (0.1.2)
prometheus-client (4.0.0)
pstore (default: 0.1.1)
psych (default: 4.0.3)
public_suffix (5.0.1)
racc (default: 1.6.0)
rake (13.0.6)
rbs (2.1.0)
rdoc (default: 6.4.0)
readline (default: 0.0.3)
recursive-open-struct (1.1.3)
reline (default: 0.3.0)
resolv (default: 0.2.1)
resolv-replace (default: 0.1.0)
rest-client (2.1.0)
rexml (3.2.5)
rinda (default: 0.1.1)
rss (0.2.9)
ruby2_keywords (default: 0.0.5)
securerandom (default: 0.1.1)
serverengine (2.3.0)
set (default: 1.0.2)
shellwords (default: 0.1.0)
sigdump (0.2.4)
singleton (default: 0.1.1)
stringio (default: 3.0.1)
strptime (0.2.5)
strscan (default: 3.0.1)
tempfile (default: 0.1.2)
test-unit (3.5.3)
time (default: 0.2.0)
timeout (default: 0.2.0)
tmpdir (default: 0.1.2)
tsort (default: 0.1.0)
typeprof (0.21.2)
tzinfo (2.0.5)
tzinfo-data (1.2022.3)
un (default: 0.2.0)
unf (0.1.4)
unf_ext (0.0.8.2 x64-mingw-ucrt)
uri (default: 0.11.0)
weakref (default: 0.1.1)
webrick (1.7.0)
win32-event (0.6.3)
win32-eventlog (0.6.7)
win32-ipc (0.7.0)
win32-service (2.3.2)
win32ole (default: 1.8.8)
winevt_c (0.10.1)
yajl-ruby (1.4.3)
yaml (default: 0.2.0)
zlib (default: 2.1.1)

Elasticsearch 7.15.0 although we are seeing the same thing in dev which runs Elasticsearch 8.3.3

@danfinn
Copy link
Author

danfinn commented Apr 26, 2023

Unfortunately it looks like we are seeing this on our linux fluentd pods as well:

2023-04-26 16:01:49 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse'" location=nil tag="kubernetes.var.log.containers.security-75_security-508c707824517ae44d7bb6a623179ca8e7afc7b8a4f97.log" time=2023-04-26 16:01:42.503536069 +0000 record={"@timestamp"=>"2023-04-26T16:01:42.0937685Z", "message"=>"User: 67b1953c-df23-4e50-bd84-a46e01233d41 has been: 95047ca19 for: dcc31c08-610-804ffac37722.", "level"=>"Information", "AuthG"=>"67b1953c-df23-4e50-", "AuthX"=>"95047ca1-c58c-", "Subject"=>"6109-40ce-8034-804ffac37722", "SourceContext"=>"Services.Security.Services", "Operation"=>"Group", "Id"=>"47bc-4b36-b6f6-bccacb87274b", "ActionName"=>"Controllers.UsersController.InsertUserAccesProjects (Stuff)", "Id"=>"67b1953c-df23-bd84-a46e01233d41", "UId"=>"10c6725c-145a-47d2-a2f1-", "Req"=>"5UURSBMN:002", "RequestPath"=>"/api/v1.0/users/software/ag", "CorrelationId"=>"0HO64PK:010", "stream"=>"stdout", "logtag"=>"F", "docker"=>{"container_id"=>"508c707824517ae44d7f98623179ca8e7afc7b8a4f97"}, "kubernetes"=>{"container_name"=>"sec", "namespace_name"=>"place", "pod_name"=>"secur6-sfbnr", "container_image"=>"container.registry.com/image:8055", "container_image_id"=>"container.registry.com/image@sha256:76d23a3eba8ac16240291537c0a83b611843b339bedfaa50a7b2e37", "pod_id"=>"d880afea-0b5b48a968d2474", "pod_ip"=>"10.19.249.6", "host"=>"nodepool1-1660-v0003", "labels"=>{"app_kubernetes_io/instance"=>"app", "app_kubernetes_io/name"=>"app", "pod-template-hash"=>"756f786"}, "master_url"=>"https://10.0.0.1:443/api", "namespace_id"=>"c5ad7c6d-224a-4aa8-239ad39ef437", "namespace_labels"=>{"azure-key-vault-env-injection"=>"enabled", "kubernetes_io/metadata_name"=>"app", "purpose"=>"app"}}, "tag"=>"kubernetes.var.log.containers.security-756f7d8486-sfbnr_app_508c707824517ae44d7bb6ab13a117954f7f0fff98623179ca8e7afc7b8a4f97.log"}

Sometimes we are shown the reason for the 400 but sometimes not, it's inconsistent.

@danfinn
Copy link
Author

danfinn commented May 1, 2023

Not sure this should matter but we found out using the elasticsearch API the reason for these errors. We are getting the following error from ES that is causing the 400:

"Limit of total fields [1000] has been exceeded while adding new fields [1]"

Here is the full error from:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Limit of total fields [1000] has been exceeded while adding new fields [1]"
    }
  },
  "status" : 400
}

Maybe the issue here is that ES is returning 2 reason fields, one is kind of generic and one has more specific information. If there is a caused_by field then the additional type and reason should maybe be included?

@cosmo0920
Copy link
Collaborator

@kenhys Could you take a look on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants