Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nil:nilclass error connecting to opensearch #42

Open
LeonD9 opened this issue Mar 28, 2022 · 20 comments
Open

nil:nilclass error connecting to opensearch #42

LeonD9 opened this issue Mar 28, 2022 · 20 comments
Labels
bug Something isn't working

Comments

@LeonD9
Copy link

LeonD9 commented Mar 28, 2022

Steps to replicate

Opensearch out buffer config:

    <label @opensearch>
      <match **>
        @type opensearch
        @id logs
        include_tag_key true
        time_key time
        scheme "#{ENV['FLUENT_OPENSEARCH_SCHEME'] || 'https'}"
        ssl_verify "#{ENV['FLUENT_OPENSEARCH_SSL_VERIFY'] || 'true'}"
        ssl_version "#{ENV['FLUENT_OPENSEARCH_SSL_VERSION'] || 'TLSv1_2'}"
        reload_connections "#{ENV['FLUENT_OPENSEARCH_RELOAD_CONNECTIONS'] || 'false'}"
        reconnect_on_error "#{ENV['FLUENT_OPENSEARCH_RECONNECT_ON_ERROR'] || 'true'}"
        reload_on_failure "#{ENV['FLUENT_OPENSEARCH_RELOAD_ON_FAILURE'] || 'true'}"
        log_os_400_reason "#{ENV['FLUENT_OPENSEARCH_LOG_ES_400_REASON'] || 'true'}"
        with_transporter_log false
        logstash_prefix 'k8s'
        logstash_dateformat "#{ENV['FLUENT_OPENSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"
        logstash_format "#{ENV['FLUENT_OPENSEARCH_LOGSTASH_FORMAT'] || 'true'}"
        index_name 'k8s'
        target_index_key "#{ENV['FLUENT_OPENSEARCH_TARGET_INDEX_KEY'] || use_nil}"
        type_name 'fluentd'
        include_timestamp "#{ENV['FLUENT_OPENSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"
        template_name "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_NAME'] || use_nil}"
        template_file "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_FILE'] || use_nil}"
        template_overwrite "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_OVERWRITE'] || use_default}"
        request_timeout "#{ENV['FLUENT_OPENSEARCH_REQUEST_TIMEOUT'] || '20s'}"
        suppress_type_name "#{ENV['FLUENT_OPENSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}"
        <endpoint>
          url "#{ENV['FLUENT_MAIN_OPENSEARCH_HOST']}"
          region "#{ENV['OPENSEARCH_REGION']}"
          assume_role_arn "#{ENV['ASSUMED_IAM_ROLE']}"
        </endpoint>
        <buffer>
          @type file
          flush_thread_count '16'
          flush_interval '10s'
          chunk_limit_size '10MB'
          chunk_limit_records '3000'
          total_limit_size '65GB'
          retry_max_interval '30'
          flush_mode 'interval'
          overflow_action 'drop_oldest_chunk'
        </buffer>
      </match>
    </label>

I receive the following error when trying to send logs to opensearch:

2022-03-28 14:28:44 +0000 [warn]: #0 [opensearch] failed to flush the buffer. retry_times=5 next_retry_time=2022-03-28 14:29:00 +0000 chunk="5db481db7caf87bc9550c195959bd79b" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"****\", :port=>443, :scheme=>\"https\"}): undefined method `call' for nil:NilClass"

what can be the reason for this error?

Expected Behavior or What you need to ask

Access is opened to opensearch and logs arrive from time to time but i keep receiving this error most of the time, what can be the reason for this error?

Using Fluentd and OpenSearch plugin versions

  • OS version
  • Bare Metal or within Docker or Kubernetes or others?:
    kubernetes
  • Fluentd v1.0 or later
    • paste result of fluentd --version or td-agent --version:
      v1.14.3-debian-elasticsearch7-1.0
  • OpenSearch plugin version
    • paste boot log of fluentd or td-agent
    • paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock:
gem list:
addressable (2.8.0)
aws-eventstream (1.2.0)
aws-partitions (1.570.0)
aws-sdk-cloudwatchlogs (1.52.0)
aws-sdk-core (3.130.0)
aws-sigv4 (1.4.0)
bigdecimal (default: 1.4.1)
bundler (2.2.24, default: 1.17.2)
cmath (default: 1.0.0)
concurrent-ruby (1.1.9)
cool.io (1.7.1)
csv (default: 3.0.9)
date (default: 2.0.2)
dbm (default: 1.0.0)
domain_name (0.5.20190701)
e2mmap (default: 0.1.0)
elasticsearch (7.15.0)
elasticsearch-api (7.15.0)
elasticsearch-transport (7.15.0)
elasticsearch-xpack (7.15.0)
etc (default: 1.0.1)
excon (0.88.0)
faraday (1.8.0)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-net_http (1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday_middleware-aws-sigv4 (0.6.1)
fcntl (default: 1.0.0)
ffi (1.15.4)
ffi-compiler (1.0.1)
fiddle (default: 1.0.0)
fileutils (default: 1.1.0)
fluent-config-regexp-type (1.0.0)
fluent-plugin-cloudwatch-logs (0.14.2)
fluent-plugin-concat (2.5.0)
fluent-plugin-dedot_filter (1.0.0)
fluent-plugin-detect-exceptions (0.0.14)
fluent-plugin-elasticsearch (5.1.4)
fluent-plugin-grok-parser (2.6.2)
fluent-plugin-http-healthcheck (0.1.0)
fluent-plugin-json-in-json-2 (1.0.2)
fluent-plugin-kubernetes_metadata_filter (2.9.2)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-opensearch (1.0.3)
fluent-plugin-parser-cri (0.1.1)
fluent-plugin-prometheus (2.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-systemd (1.0.5)
fluentd (1.14.3)
forwardable (default: 1.2.0)
gdbm (default: 2.0.0)
http (4.4.1)
http-accept (1.7.0)
http-cookie (1.0.4)
http-form_data (2.3.0)
http-parser (1.2.3)
http_parser.rb (0.8.0)
io-console (default: 0.4.7)
ipaddr (default: 1.2.2)
irb (default: 1.0.0)
jmespath (1.6.1)
json (default: 2.1.0)
jsonpath (1.1.0)
kubeclient (4.9.2)
logger (default: 1.3.0)
lru_redux (1.1.0)
matrix (default: 0.1.0)
mime-types (3.4.1)
mime-types-data (3.2021.1115)
msgpack (1.4.2)
multi_json (1.15.0)
multipart-post (2.1.1)
mutex_m (default: 0.1.0)
netrc (0.11.0)
oj (3.11.0)
opensearch-api (1.0.0)
opensearch-ruby (1.0.0)
opensearch-transport (1.0.0)
openssl (default: 2.1.2)
ostruct (default: 0.1.0)
prime (default: 0.1.0)
prometheus-client (2.1.0)
psych (default: 3.1.0)
public_suffix (4.0.6)
rake (13.0.6)
rdoc (default: 6.1.2.1)
recursive-open-struct (1.1.3)
rest-client (2.1.0)
rexml (default: 3.1.9.1)
rss (default: 0.2.7)
ruby2_keywords (0.0.5)
scanf (default: 1.0.0)
sdbm (default: 1.0.0)
serverengine (2.2.4)
shell (default: 0.7)
sigdump (0.2.4)
stringio (default: 0.0.2)
strptime (0.2.5)
strscan (default: 1.0.0)
sync (default: 0.5.0)
systemd-journal (1.4.2)
thwait (default: 0.1.0)
tracer (default: 0.1.0)
tzinfo (2.0.4)
tzinfo-data (1.2021.5)
unf (0.1.4)
unf_ext (0.0.8)
webrick (1.7.0, default: 1.4.4)
yajl-ruby (1.4.1)
zlib (default: 1.0.0)
  • OpenSearch version (optional): 1.1
@ryn9
Copy link

ryn9 commented Mar 28, 2022

Please try the following and report back your results...

get rid of 'type_name' entry

either:
-get rid of 'include_tag_key true'
-add 'tag_key' entry

@LeonD9
Copy link
Author

LeonD9 commented Mar 29, 2022

@ryn9 removed type_name and include_tag_key didn't help i still continue to get the following error:

2022-03-28 14:28:44 +0000 [warn]: #0 [opensearch] failed to flush the buffer. retry_times=5 next_retry_time=2022-03-28 14:29:00 +0000 chunk="5db481db7caf87bc9550c195959bd79b" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"****\", :port=>443, :scheme=>\"https\"}): undefined method `call' for nil:NilClass"

@LeonD9
Copy link
Author

LeonD9 commented Mar 29, 2022

It seems its an issue related to fluent-plugin-opensearch latest version, reverted to 1.0.2 fixed my issue.

@LeonD9 LeonD9 closed this as completed Mar 29, 2022
@ryn9
Copy link

ryn9 commented Mar 29, 2022

@cosmo0920 - perhaps a bug in 1.0.3 ?

@ryn9
Copy link

ryn9 commented Mar 29, 2022

@cosmo0920 - I hit the same bug when using 1.0.3

@cosmo0920
Copy link
Collaborator

Sure, I'll take a look.

@cosmo0920 cosmo0920 reopened this Mar 30, 2022
@cosmo0920
Copy link
Collaborator

Could you paste around the error? Ruby is script language.
We should know where is error occurred and how it is caused.

@cosmo0920 cosmo0920 added the bug Something isn't working label Mar 30, 2022
@ryn9
Copy link

ryn9 commented Mar 30, 2022

@cosmo0920

Here is a 1.0.3 error

2022-03-30 20:33:16 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5db75748cfce7588a2607967c0bca145" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-03-30 20:33:20 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5db75748cfce7588a2607967c0bca145"
2022-03-30 20:33:20 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=0 next_retry_time=2022-03-30 20:33:50 +0000 chunk="5db75748cfce7588a2607967c0bca145" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1495:in `flush_thread_run'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-03-30 20:33:50 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5db75748cfce7588a2607967c0bca145"
2022-03-30 20:33:50 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=1 next_retry_time=2022-03-30 20:34:24 +0000 chunk="5db75748cfce7588a2607967c0bca145" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
  2022-03-30 20:33:50 +0000 [warn]: #0 suppressed same stacktrace

As compared to same config and same test messages being sent through 1.0.2
2022-03-30 20:39:57 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5db758c75713c0982038aed5cf73d4cd" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
And then it flushed successfully without any logged messages

@LeonD9
Copy link
Author

LeonD9 commented Apr 4, 2022

@cosmo0920 @ryn9 did you guys also see this warning message in 1.0.2 gem:
2022-04-04 07:15:53 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `+' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'" tag="kubernetes.var.log.containers.kube-metrics-adapter-6f75fbbbc4-s5smw_kube-system_kube-metrics-adapter-ee0fb2300dec03ae655ae7ad6e094e98dcebb26b61ac9e0213647a9cb015f763.log" 2022-04-04 07:15:53 +0000 [warn]: #0 suppressed same stacktrace

any idea what can cause this error?

@cosmo0920
Copy link
Collaborator

@LeonD9 Could you paste more around error? Ruby's backtrace is also important to digging error reason.

@ryn9
Copy link

ryn9 commented Apr 5, 2022

@cosmo0920 - I ran this again - tracing - and it looks like this occurs in the bulk error condition. If I send a payload with no errors - it appears to work okay. When I send a payload with an error - it errors like above.

Here is a slightly modified trace:

2022-04-05 15:49:21 +0000 [trace]: #0 [out_opensearch] writing events into buffer instance=3000 metadata_size=1
2022-04-05 15:49:21 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5dbea303f4f77964612f171a1b631b04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:21 +0000 [trace]: #0 [out_opensearch] chunk /fluentd/root_dir/worker0/out_opensearch/buffer/buffer.b5dbea303f4f77964612f171a1b631b04.log size_added: 206 new_size: 206
2022-04-05 15:49:22 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #1 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] enqueueing chunk instance=3000 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [trace]: #3 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] dequeueing a chunk instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] chunk dequeued instance=3000 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] trying flush for a chunk chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] adding write count instance=3440
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] executing sync write chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] bulk request: {"index":{"_index":"demo-field1-field2-2022.13","_type":"_doc"}}
{"foo.bar":"test","message":"test","@timestamp":"2022-03-31T13:52:51.315315315+00:00"}

2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] bulk response: {"took"=>7, "errors"=>true, "items"=>[{"index"=>{"_index"=>"demo-field1-field2-2022.13", "_type"=>"_doc", "_id"=>"0Xxq-n8Br-TZ0qY9lARI", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Could not dynamically add mapping for field [foo.bar]. Existing mapping for [foo] must be of type object but found [text]."}}}]}
2022-04-05 15:49:22 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] taking back a chunk instance=3000 chunk_id="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] chunk taken back instance=3000 chunk_id="5dbea303f4f77964612f171a1b631b04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=0 next_retry_time=2022-04-05 15:49:27 +0000 chunk="5dbea303f4f77964612f171a1b631b04" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1495:in `flush_thread_run'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-04-05 15:49:23 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #1 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #0 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #3 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:24 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000


@LeonD9
Copy link
Author

LeonD9 commented Apr 16, 2022

@cosmo0920 the same as this issue: fluent/fluentd#1973
there is no more logs except this in debug/trace mode:

2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] writing events into buffer instance=458700 metadata_size=1
2022-04-16 19:25:26 +0000 [debug]: #0 [main-logs] Created new chunk chunk_id="5dcd50bbcfeaf9b7d6fc14f983017cbb" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.log.containers.test.log", variables=nil, seq=0>
2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] chunk /var/log/fluentd/worker0/main-logs/buffer/buffer.b5dcd50bbcfeaf9b7d6fc14f983017cbb.log size_added: 3499 new_size: 3499
2022-04-16 19:25:26 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `+' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'" tag="kubernetes.var.log.containers.test.log"
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:853:in `emit_buffered'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:830:in `emit_events'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/out_relabel.rb:29:in `process'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:839:in `emit_sync'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:338:in `on_message'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:226:in `block in handle_connection'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:263:in `block (3 levels) in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `feed_each'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `block (2 levels) in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:271:in `block in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/server.rb:613:in `on_read_without_connection'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:123:in `on_readable'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:186:in `on_readable'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-04-16 19:25:26 +0000 [error]: #0 unexpected error on reading data host="10.1.7.177" port=42756 error_class=NoMethodError error="undefined method `get' for nil:NilClass"

@ryn9
Copy link

ryn9 commented Apr 27, 2022

@cosmo0920 anything more we can provide in order to get this resolved?

@cosmo0920
Copy link
Collaborator

Ordinary records will not cause this issue. Could you provide a sample record that makes this error?
Providing reproducible step is more better but sample invalid record should be needed for resolving this issue.

@ghost
Copy link

ghost commented May 19, 2022

I've encountered this issue too. Interestingly, I never observed the problem before adding filters to retag records with JSON in the message & re-parse. fluentd.conf now looks like:

      <label @FLUENT_LOG>
        <match fluent.**>
          @type null
          @id ignore_fluent_logs
        </match>
      </label>

      <source>
        @type tail
        @id in_tail_container_logs
        path "/var/log/containers/*.log"
        pos_file "/var/log/fluentd-containers.log.pos"
        tag "kubernetes.*"
        exclude_path []
        read_from_head true
        <parse>
          @type "cri"
          time_format "%Y-%m-%dT%H:%M:%S.%N%z"
        </parse>
      </source>

      <filter kubernetes.**>
        @type kubernetes_metadata
        @id filter_kube_metadata
        kubernetes_url "https://172.30.0.1:443/api"
        verify_ssl true
        ca_file ""
        skip_labels false
        skip_container_metadata false
        skip_master_url false
        skip_namespace_metadata false
        watch true
      </filter>

      <match kubernetes.**>
        @type rewrite_tag_filter
        <rule>
          key     message
          pattern /^\{.*\}$/
          tag     application.json
        </rule>
        <rule>
          key     kubernetes.container_name
          pattern /^known-container$/
          tag     application.freetext
        </rule>
        <rule>
          key     message
          pattern /.*$/
          tag     unidentified
        </rule>
      </match>

      <filter application.json>
        @type parser
        key_name "message"
        reserve_data true
        remove_key_name_field true
        <parse>
          @type json
        </parse> 
      </filter>

      <match application.json>
        @type opensearch
        @id out_os_app_json
        include_tag_key true
        host "opensearch-cluster-master"
        port 9200
        path ""
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        user "admin"
        password "xxxxxxx"
        index_name "fluentd_application"
        request_timeout 30s # defaults to 5s
        suppress_type_name true
      </match>
      <match application.**>
        @type opensearch
        @id out_os_app
        include_tag_key true
        host "opensearch-cluster-master"
        port 9200
        path ""
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        user "admin"
        password "xxxxxxx"
        index_name "fluentd_application"
        request_timeout 30s # defaults to 5s
        suppress_type_name true
      </match>      
      # # Catchall...
      <match **>
        @type stdout
        @id unidentified_log
      </match>

Error output from fluentd:

2022-05-19 09:14:41 +0000 [warn]: #0 [out_os_app_json] failed to flush the buffer. retry_times=0 next_retry_time=2022-05-19 09:14:43 +0000 chunk="5df59ca32e546139145bdc1c15cf2986" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"opensearch-cluster-master\", :port=>9200, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\", :path=>\"\"}): undefined method `call' for nil:NilClass"
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

I can't tell what record(s) are causing the problem, but when I switch the out_os_app_json output to stdout, a sample record looks like this.

{
	"stream"=>"stdout", 
	"logtag"=>"F", 
	"message"=>"{\"remote\":\"127.0.0.1:52400\",\"client\":\"conn216652\",\"doc\":{\"driver\":{\"name\":\"mongo-go-driver\",\"version\":\"v1.7.0\"},\"os\":{\"type\":\"linux\",\"architecture\":\"amd64\"},\"platform\":\"go1.17.5\",\"mongos\":{\"host\":\"shared-mongodb-mongos-5d65d4667c-n2fwv:27017\",\"client\":\"127.0.0.1:52400\",\"version\":\"4.4.10-11\"}}}}", 
	"time"=>"2022-05-19T09:00:57.120674031+00:00", 
	"docker"=>{"container_id"=>"1e31fb4a258736fc6eaeb35d5b5ed45c12c21963f64a97e917e87bd79734384b"}, 
	"kubernetes"=>{"container_name"=>"fluentd", 
		"namespace_name"=>"xxxxxxxx", 
		"pod_name"=>"fluentd-2n9kd", 
		"container_image"=>"docker.io/fluent/fluentd-kubernetes-daemonset:v1.14.6-debian-opensearch-1.0", 
		"container_image_id"=>"docker.io/fluent/fluentd-kubernetes-daemonset@sha256:054d75bef7ca91b6434d2bc86d387b7de919c408dac7e0ee88322cc5926d8798", 
		"pod_id"=>"3519ab44-1b54-4346-97f6-fdb0254d41cd", 
		"pod_ip"=>"10.129.1.7", 
		"host"=>"compute-0", 
		"labels"=>{"controller-revision-hash"=>"68458f67f6", 
			"k8s-app"=>"fluentd-logging", 
			"pod-template-generation"=>"1", 
			"version"=>"v1"},
		"master_url"=>"https://172.30.0.1:443/api", 
		"namespace_id"=>"ec79e7e2-526a-4213-9f43-fcb8ef8cafc7", 
		"namespace_labels"=>{"kubernetes_io/metadata_name"=>"xxxxxxxx"}
	}
}

NOTE that, whilst I can't tell if this is the offending record, I've selected it as I've noticed the json in the message hash contains a spurious }. This invalid json log appears to have been generated by fluentd container, presumably triggered by some condition in the original log, but I'm yet to find the root cause.

Whether this invalid json is related to this issue or not, I don't know.

@ghost
Copy link

ghost commented May 19, 2022

Running with trace logs I've found illegal_state_exception Can't get text on a START_OBJECT at 1:1320:

2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] bulk response: {"took"=>925, "errors"=>true, "items"=>[
...
...
{"index"=>{"_index"=>"dc4h-goal-1-application", "_id"=>"yKfL3IAB4pmfZooFjGdO", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [attr.command] of type [text] in document with id 'yKfL3IAB4pmfZooFjGdO'. Preview of field's value: '{ordered=true, $replData=1, $configServerState={opTime={t=1, ts={$timestamp={t=1652971704, i=3}}}}, $db=config, $clusterTime={clusterTime={$timestamp={t=1652971704, i=4}}, signature={keyId=7099418590524211201, hash={$binary={base64=ww8hldh66Ok8b+I4w+Gt7OUgUnA=, subType=0}}}}, maxTimeMS=30000, update=mongos, bypassDocumentValidation=false, writeConcern={wtimeout=60000, w=majority}, updates=[{q={_id=shared-mongodb-mongos-6bd459598f-9bwvp:27017}, u={$set={mongoVersion=4.4.10-11, waiting=true, ping={$date=2022-05-19T14:48:25.183Z}, _id=shared-mongodb-mongos-6bd459598f-9bwvp:27017, up=9474, advisoryHostFQDNs=[]}}, upsert=true, multi=false}]}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:1320"}}}}
...
...
2022-05-19 14:49:29 +0000 [debug]: #0 [out_os_app_json] taking back chunk for errors. chunk="5df5e77819c10486f117a8ce77af0598"
2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] taking back a chunk instance=50740 chunk_id="5df5e77819c10486f117a8ce77af0598"
2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] chunk taken back instance=50740 chunk_id="5df5e77819c10486f117a8ce77af0598" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="application.json", variables=nil, seq=0>
2022-05-19 14:49:30 +0000 [warn]: #0 [out_os_app_json] failed to flush the buffer. retry_times=0 next_retry_time=2022-05-19 14:49:31 +0000 chunk="5df5e77819c10486f117a8ce77af0598" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"opensearch-cluster-master.dc4h-goal-1.svc\", :port=>9200, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\", :path=>\"\"}): undefined method `call' for nil:NilClass"

The related record for the bulk insert appears to be:

{"stream":"stdout","logtag":"F","time":"2022-05-19T14:48:25.332505099+00:00","docker":{"container_id":"2699a109e1c84b45c3636b03460ba99ec70ea88b5066ca17c427ccd0c97e29b8"},"kubernetes":{"container_name":"mongod","namespace_name":"redacted","pod_name":"shared-mongodb-cfg-0","container_image":"docker.io/percona/percona-server-mongodb:4.4.10-11","container_image_id":"docker.io/percona/percona-server-mongodb@sha256:ea73a506fa02604660e3ef7d452d142a89587bb5daca15d3cc1b539a9b1000c5","pod_id":"a05dc0e0-d49b-41e6-a2cf-47e1d5366c1e","pod_ip":"10.129.1.26","host":"compute-0","labels":{"controller-revision-hash":"shared-mongodb-cfg-67b7fc579","app_kubernetes_io/component":"cfg","app_kubernetes_io/instance":"shared-mongodb","app_kubernetes_io/managed-by":"percona-server-mongodb-operator","app_kubernetes_io/name":"percona-server-mongodb","app_kubernetes_io/part-of":"percona-server-mongodb","app_kubernetes_io/replset":"cfg","statefulset_kubernetes_io/pod-name":"shared-mongodb-cfg-0"},"master_url":"https://172.30.0.1:443/api","namespace_id":"d83f45cc-dc13-4846-88f2-e9991388c94d","namespace_labels":{"kubernetes_io/metadata_name":"redacted"}},"t":{"$date":"2022-05-19T14:48:25.332+00:00"},"s":"I","c":"COMMAND","id":51803,"ctx":"conn11951","msg":"Slow query","attr":{"type":"command","ns":"config.$cmd","command":{"update":"mongos","bypassDocumentValidation":false,"ordered":true,"updates":[{"q":{"_id":"shared-mongodb-mongos-6bd459598f-9bwvp:27017"},"u":{"$set":{"_id":"shared-mongodb-mongos-6bd459598f-9bwvp:27017","ping":{"$date":"2022-05-19T14:48:25.183Z"},"up":9474,"waiting":true,"mongoVersion":"4.4.10-11","advisoryHostFQDNs":[]}},"multi":false,"upsert":true}],"writeConcern":{"w":"majority","wtimeout":60000},"maxTimeMS":30000,"$replData":1,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1652971704,"i":4}},"signature":{"hash":{"$binary":{"base64":"ww8hldh66Ok8b+I4w+Gt7OUgUnA=","subType":"0"}},"keyId":7099418590524211201}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1652971704,"i":3}},"t":1}},"$db":"config"},"numYields":0,"reslen":619,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":3},"writeConcern":{"w":"majority","wtimeout":60000,"provenance":"clientSupplied"},"storage":{},"protocol":"op_msg","durationMillis":103},"tag":"application.json"}

The JSON looks valid as far as I can tell, so my previous comment may be a red herring. However, the addition of replace_invalid_sequence true to the json <filter> does seem to stop the errors occurring on the openshift out plugin.... UPDATE: not true. replace_invalid_sequence true does not work around the problem

hth

@cosmo0920
Copy link
Collaborator

Thanks! Possible candidates of reproducible records should be very helpful for us. I'll dig into the root cause of this issue.

@sdwerwed
Copy link

sdwerwed commented May 23, 2022

@cosmo0920 I can reproduce it, I think it is happening if the bulk request gets error 400 from the OpenSearch, for example, because of datatype mismatch #53. I am getting this error after I upgraded to the latest version. Before the error was very clear when log_os_400_reason true was set, now if fails to flush the chunk and gives undefined method ``call' for nil:NilClass"

I have downgraded to 1.0.2 with fluent-gem install 'fluent-plugin-opensearch' -v 1.0.2 and works as expected.

@thoro
Copy link

thoro commented May 23, 2022

Same issue with 1.0.4, it seems to also not flush the items from the buffer, which leads to a buffer overrun and then total system halt (i.e. no logs at all)

Downgrade to 1.0.2 solved it. But frankly that's not really a solution.

@cosmo0920
Copy link
Collaborator

I'm not sure why block becomes nil.
Then, I created a patch to use if clause instead of block operation.
How about this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants