Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opensearch auth times out when using opensearch plugin #68

Open
ngamber opened this issue Jul 11, 2022 · 8 comments
Open

opensearch auth times out when using opensearch plugin #68

ngamber opened this issue Jul 11, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@ngamber
Copy link

ngamber commented Jul 11, 2022

(check apply)

  • [ X] read the contribution guideline
  • [] (optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Steps to replicate

Provide example config and message

<label "@#{ENV['ENDPOINT_NAME']}">
  <match **>
    @type "opensearch_data_stream"
    @id "out-aws-es-#{worker_id}"
    @log_level "#{ENV['OUTPUT_LOG_LEVEL']}"
    log_es_400_reason true
    logstash_format false
    data_stream_name "ds-#{ENV['ENDPOINT_NAME']}"
    include_timestamp true
    include_tag_key true
    time_key timestamp
    flush_interval 5s
    slow_flush_log_threshold 135.0
    reconnect_on_error true
    reload_on_failure true
    reload_connections false
    request_timeout 300s

    <buffer>
      @type memory

      chunk_limit_size 20MB
      flush_mode interval
      flush_interval 5s
      flush_thread_count 12
      flush_at_shutdown true
      retry_max_times 2
      retry_wait 60s
      retry_type exponential_backoff
      retry_exponential_backoff_base 3
      retry_timeout 30m
      overflow_action drop_oldest_chunk
      disable_chunk_backup true
      total_limit_size "#{ENV['TOTAL_BUFFER_SIZE']}MB"
    </buffer>

    <endpoint>
      url "https://#{ENV['ES_ENDPOINT']}"
      region us-east-2
      assume_role_arn "#{ENV['COLLECTOR_SVC_ROLE']}"
    </endpoint>
  </match>
</label>

When using the opensearch plugin, we now get lots of errors like this on our fluentd collectors:

"error": "#<Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure: could not push logs to OpenSearch cluster (ds-janus): [400] {"Message":"You have exceeded the number of permissible concurrent requests with unique IAM Identities. Please retry."}>"

Expected Behavior or What you need to ask

We're wondering if this is due to fb04e91

Prior to implementing this plugin within our collectors we did not have this problem.

Using Fluentd and OpenSearch plugin versions

Fluentd v1.14.4-1.0
AWS Opensearch 1.2
fluent-plugin-opensearch 1.0.7

@ngamber
Copy link
Author

ngamber commented Jul 12, 2022

We've also noticed that the auth request doesn't seem to pass in a maximum session duration. It would make sense to set this to the same as the refresh_credentials_interval so that it doesn't expire before then.

RecoverableRequestFailure error=\"could not push logs to OpenSearch cluster (datastream-test): [403] {\\\"message\\\":\\\"The security token included in the request is expired

This is after changing refresh_credentials_interval to 10h and the maximum session duration on the role has been set to 12h.

Per AWS:

To learn how to view the maximum value for your role, see View the maximum session duration setting for a role. If you do not pass this parameter, the temporary credentials expire in one hour.

@ngamber
Copy link
Author

ngamber commented Jul 12, 2022

Not too good with Ruby but I assume adding a line like this here might help?

https://github.com/Barracuda-CloudOps/fluent-plugin-opensearch/blob/main/lib/fluent/plugin/out_opensearch.rb#L239

duration_seconds: conf[:[refresh_credentials_interval.to_s]

@cosmo0920
Copy link
Collaborator

#78 works for you?

@cosmo0920 cosmo0920 added the bug Something isn't working label Sep 1, 2022
@antoniocascais
Copy link

We've also noticed that the auth request doesn't seem to pass in a maximum session duration. It would make sense to set this to the same as the refresh_credentials_interval so that it doesn't expire before then.

RecoverableRequestFailure error=\"could not push logs to OpenSearch cluster (datastream-test): [403] {\\\"message\\\":\\\"The security token included in the request is expired

This is after changing refresh_credentials_interval to 10h and the maximum session duration on the role has been set to 12h.

Per AWS:

To learn how to view the maximum value for your role, see View the maximum session duration setting for a role. If you do not pass this parameter, the temporary credentials expire in one hour.

Is there a solution for this? We are also having the same issue 🤔

@kaiohenricunha
Copy link

Same issue here. This is critical. Any workaround? I thought that by setting refresh_credentials_interval it would work.

@Jonniedev
Copy link

What version are you on?

Isn't #74 solves the problem?
It has been merged and released in v1.1.1.

Do you think the problem still exist in v1.1.1 or above?

@kaiohenricunha
Copy link

kaiohenricunha commented Jun 29, 2023

What version are you on?

Isn't #74 solves the problem? It has been merged and released in v1.1.1.

Do you think the problem still exist in v1.1.1 or above?

The fluent-operator automatically updates the plugin's version. I only noticed this update you mentioned because fluentd started throwing errors as discussed here:
fluent/fluent-operator#814

I tried setting the session duration of my IAM role to the same session duration of the plugin's default: 5h

It works for a few errors, then gets stuck again.

@kaiohenricunha
Copy link

Issue persists and is even worse now:
#107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants