Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not communicate to OpenSearch, resetting connection and trying again. [404] #126

Open
kentan88 opened this issue Feb 20, 2024 · 4 comments

Comments

@kentan88
Copy link

kentan88 commented Feb 20, 2024

Steps to replicate

Provide example config and message
Dockerfile

# Use the fluentd base image
FROM fluent/fluentd:v1.15-debian-1

USER root

# Install necessary dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

RUN gem install faraday-net_http multi_json aws-eventstream faraday aws-sigv4 opensearch-ruby faraday_middleware-aws-sigv4 fluent-plugin-opensearch excon faraday-excon jmespath aws-partitions aws-sdk-core fluent-plugin-opensearch

# Switch back to fluent user
USER fluent

# Copy the configuration file to the Fluentd configuration directory
COPY ./config/fluent-opensearch.conf /fluentd/etc/fluent.conf

# Expose port for Fluentd
EXPOSE 24224

# Run Fluentd with the configuration file
# (often located at /etc/fluent/fluent.conf or /etc/td-agent/td-agent.conf). Add an output section with the OpenSearch configuration.
CMD ["fluentd", "-c", "/fluentd/etc/fluent.conf"]

fluent.conf

<match es.**>
  @type opensearch
  logstash_format true
  include_tag_key true
  flush_interval 1s

  <endpoint>
    url https://xxxxx.ap-southeast-1.aoss.amazonaws.com
    region ap-southeast-1
    access_key_id XXXXXXXXXXXX
    secret_access_key XXXXXXXXXXXX
    aws_service_name aoss
  </endpoint>
</match>

Expected Behavior or What you need to ask

I'm running a local Docker which uses fluent/fluentd:v1.15-debian-1 as the base image.
When I ran the container, i'm getting the following message:

2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:23 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-02-20 07:52:23 +0000 [info]: gem 'fluentd' version '1.15.3'
2024-02-20 07:52:23 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.4'
2024-02-20 07:52:23 +0000 [info]: using configuration file: <ROOT>
  <match es.**>
    @type opensearch
    <endpoint>
      url https://XXXXXXXXXXXX.ap-southeast-1.aoss.amazonaws.com/
      region "ap-southeast-1"
      access_key_id "XXXXXXXXXXXX"
      secret_access_key xxxxxx
      aws_service_name aoss
    </endpoint>
  </match>
</ROOT>
2024-02-20 07:52:23 +0000 [info]: starting fluentd-1.15.3 pid=7 ruby="3.1.3"
2024-02-20 07:52:23 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: adding match pattern="es.**" type="opensearch"
2024-02-20 07:52:26 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:26 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
2024-02-20 07:52:30 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:30 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
2024-02-20 07:52:38 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:38 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).

I can confirm that the AWS credentials and AWS OpenSearch Serverless endpoint are correct and also reachable as I was able to send data using a ruby OpenSearch client.

Any help would be much appreciated.
...

Using Fluentd and OpenSearch plugin versions

  • OS version fluent/fluentd:v1.15-debian-1
  • Docker
  • Fluentd v1.15.3
  • OpenSearch plugin version 1.1.4
@mhkarimi1383
Copy link

Having the same problem with OpenSearch K8s operator and I have to restart fluentd daemon set to fix the problem every time.

@mhkarimi1383
Copy link

@kentan88

Have you tried setting reload_on_failure to true?
I saw this option in README, I will test it and I think this will resolve the issue :)

@mhkarimi1383
Copy link

setting reload_on_failure to true did not fixed the problem

@mhkarimi1383
Copy link

mhkarimi1383 commented May 1, 2024

livenessProbe:
  httpGet: null
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
  exec:
    command:
      - bash
      - -c
      - >
        set -ex;
        curl -s http://localhost:24231/metrics
        | grep -E "fluentd_output_status_retry_wait|fluentd_output_status_num_errors|fluentd_output_status_retry_count" 
        | grep -Ev "# HELP|# TYPE"
        | grep -v "0.0"
        | wc -l | grep 0

I have added these values into the daemonset helm chart it should restart containers when retry or error happens

(Do not forget to install curl in your docker image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants