Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration reload or SIGINT does not interrupt flushing output plugin with retry_limit=False, potential loops forever #8792

Open
shaohme opened this issue May 3, 2024 · 0 comments

Comments

@shaohme
Copy link

shaohme commented May 3, 2024

Bug Report

Describe the bug
fluent-bit with "retry_limit False" output plugin keeps trying to flush chunks seemingly forever.

To Reproduce

  1. Configure fluent-bit with working OTEL input plugin and wrongly configured OTEL output plugin, so that it cannot flush its chunks.
  2. Generate some OTEL data for fluent-bit.
  3. Alter the configuration with correct values, and issue a reload via SIGHUP or HTTP API.

The log below show what happens after fluent-bit received input data and a reload is issued.

[2024/05/03 11:02:17] [engine] caught signal (SIGHUP)
[2024/05/03 11:02:17] [ info] reloading instance pid=19707 tid=0x7fa4aef010
[2024/05/03 11:02:17] [ info] [reload] stop everything of the old context
[2024/05/03 11:02:17] [ warn] [engine] service will shutdown when all remaining tasks are flushed
[2024/05/03 11:02:17] [debug] [engine] re-scheduled retry=0x7f9c066090 for task 0
[2024/05/03 11:02:17] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:17] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:17] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:17] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:17] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:17] [ info] [task] opentelemetry/opentelemetry.0 has 1 pending task(s):
[2024/05/03 11:02:17] [ info] [task]   task_id=0 still running on route(s): opentelemetry/opentelemetry.0 
[2024/05/03 11:02:17] [ info] [task] storage_backlog/storage_backlog.1 has 0 pending task(s):
[2024/05/03 11:02:17] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:17] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:17] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:17] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:17] [debug] [out flush] cb_destroy coro_id=3
[2024/05/03 11:02:17] [debug] [retry] re-using retry for task_id=0 attempts=4
[2024/05/03 11:02:17] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:18] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:18] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:18] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:18] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:18] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:18] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:18] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:18] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:18] [debug] [out flush] cb_destroy coro_id=4
[2024/05/03 11:02:18] [debug] [retry] re-using retry for task_id=0 attempts=5
[2024/05/03 11:02:18] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:19] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:19] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:19] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:19] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:19] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:19] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:19] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:19] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:19] [debug] [out flush] cb_destroy coro_id=5
[2024/05/03 11:02:19] [debug] [retry] re-using retry for task_id=0 attempts=6
[2024/05/03 11:02:19] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:20] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:20] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:20] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:20] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:20] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:20] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:20] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:20] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:20] [debug] [out flush] cb_destroy coro_id=6
[2024/05/03 11:02:20] [debug] [retry] re-using retry for task_id=0 attempts=7
[2024/05/03 11:02:20] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:21] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:21] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:21] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:21] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:21] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:21] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:21] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:21] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:21] [debug] [out flush] cb_destroy coro_id=7
[2024/05/03 11:02:21] [debug] [retry] re-using retry for task_id=0 attempts=8
[2024/05/03 11:02:21] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:22] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:22] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:22] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:22] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:22] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:22] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:22] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:22] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:22] [debug] [out flush] cb_destroy coro_id=8
[2024/05/03 11:02:22] [debug] [retry] re-using retry for task_id=0 attempts=9
[2024/05/03 11:02:22] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:23] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:23] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:23] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:23] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:23] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:23] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:23] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:23] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:23] [debug] [out flush] cb_destroy coro_id=9
[2024/05/03 11:02:23] [debug] [retry] re-using retry for task_id=0 attempts=10
[2024/05/03 11:02:23] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:24] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:24] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:24] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:24] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:24] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:24] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:24] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:24] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:24] [debug] [out flush] cb_destroy coro_id=10
[2024/05/03 11:02:24] [debug] [retry] re-using retry for task_id=0 attempts=11
[2024/05/03 11:02:24] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:25] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:25] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:25] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:25] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:25] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:25] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:25] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:25] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:25] [debug] [out flush] cb_destroy coro_id=11
[2024/05/03 11:02:25] [debug] [retry] re-using retry for task_id=0 attempts=12
[2024/05/03 11:02:25] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:26] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:26] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:26] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:26] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:26] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:26] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:26] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:26] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:26] [debug] [out flush] cb_destroy coro_id=12
[2024/05/03 11:02:26] [debug] [retry] re-using retry for task_id=0 attempts=13
[2024/05/03 11:02:26] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:27] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:27] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:27] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:27] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:27] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:27] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:27] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:27] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:27] [debug] [out flush] cb_destroy coro_id=13
[2024/05/03 11:02:27] [debug] [retry] re-using retry for task_id=0 attempts=14
[2024/05/03 11:02:27] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:28] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:28] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:28] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:28] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:28] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:28] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:28] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:28] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:28] [debug] [out flush] cb_destroy coro_id=14
[2024/05/03 11:02:28] [debug] [retry] re-using retry for task_id=0 attempts=15
[2024/05/03 11:02:28] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:29] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:29] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:29] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:29] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:29] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:29] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:29] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:29] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:29] [debug] [out flush] cb_destroy coro_id=15
[2024/05/03 11:02:29] [debug] [retry] re-using retry for task_id=0 attempts=16
[2024/05/03 11:02:29] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:30] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:30] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:30] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:30] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:30] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:30] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:30] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:30] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:30] [debug] [out flush] cb_destroy coro_id=16
[2024/05/03 11:02:30] [debug] [retry] re-using retry for task_id=0 attempts=17
[2024/05/03 11:02:30] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:31] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:31] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:31] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:31] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:31] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:31] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:31] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:31] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:31] [debug] [out flush] cb_destroy coro_id=17
[2024/05/03 11:02:31] [debug] [retry] re-using retry for task_id=0 attempts=18
[2024/05/03 11:02:31] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:32] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:32] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:32] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:32] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:32] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:32] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:32] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:32] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:32] [debug] [out flush] cb_destroy coro_id=18
[2024/05/03 11:02:32] [debug] [retry] re-using retry for task_id=0 attempts=19
[2024/05/03 11:02:32] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:33] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:33] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:33] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:33] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:33] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:33] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:33] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:33] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:33] [debug] [out flush] cb_destroy coro_id=19
[2024/05/03 11:02:33] [debug] [retry] re-using retry for task_id=0 attempts=20
[2024/05/03 11:02:33] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:34] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:34] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:34] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:34] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:34] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:34] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:34] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:34] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:34] [debug] [out flush] cb_destroy coro_id=20
[2024/05/03 11:02:34] [debug] [retry] re-using retry for task_id=0 attempts=21
[2024/05/03 11:02:34] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:35] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:35] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:35] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:35] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:35] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:35] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:35] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:35] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:35] [debug] [out flush] cb_destroy coro_id=21
[2024/05/03 11:02:35] [debug] [retry] re-using retry for task_id=0 attempts=22
[2024/05/03 11:02:35] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:36] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:36] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:36] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:36] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:36] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:36] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:36] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:36] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:36] [debug] [out flush] cb_destroy coro_id=22
[2024/05/03 11:02:36] [debug] [retry] re-using retry for task_id=0 attempts=23
[2024/05/03 11:02:36] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:37] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:37] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:37] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:37] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:37] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:37] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:37] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:37] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:37] [debug] [out flush] cb_destroy coro_id=23
[2024/05/03 11:02:37] [debug] [retry] re-using retry for task_id=0 attempts=24
[2024/05/03 11:02:37] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)
[2024/05/03 11:02:38] [ info] [input] pausing storage_backlog.1
[2024/05/03 11:02:38] [debug] [output:opentelemetry:opentelemetry.0] ctraces msgpack size: 602
[2024/05/03 11:02:38] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 286
[2024/05/03 11:02:38] [debug] [upstream] KA connection #29 to xxx.xxx:443 has been assigned (recycled)
[2024/05/03 11:02:38] [debug] [http_client] not using http_proxy for header
[2024/05/03 11:02:38] [error] [output:opentelemetry:opentelemetry.0] xxx.xxx:443, HTTP status=401
[2024/05/03 11:02:38] [debug] [upstream] KA connection #29 to xxx.xxx:443 is now available
[2024/05/03 11:02:38] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_RETRY
[2024/05/03 11:02:38] [debug] [out flush] cb_destroy coro_id=24
[2024/05/03 11:02:38] [debug] [retry] re-using retry for task_id=0 attempts=25
[2024/05/03 11:02:38] [ warn] [engine] failed to flush chunk '19707-1714734107.667106355.flb', retry in 1 seconds: task_id=0, input=opentelemetry.0 > output=opentelemetry.0 (out_id=0)

Expected behavior
fluent-bit should interrupt flushing output plugins making it possible to reload configuration if it was wrongly configured or the information needs updated.

Screenshots

Your Environment

  • Version used: 3.0.3
  • Configuration:
[SERVICE]
	HTTP_Server Off
	Hot_Reload On
	Log_Level debug
	flush 1
	storage.path /var/fluent_bit
	storage.sync normal
	storage.checksum off
	storage.max_chunks_up 32
	storage.backlog.mem_limit 5M
	storage.delete_irrecoverable_chunks off

[FILTER]
	Name throttle
	Match *
	Rate 1
	Window 3
	Interval 3s

[INPUT]
	name opentelemetry
	storage.type filesystem
	listen 127.0.0.1
	port 4318
	raw_traces false
	successful_response_code 200
	storage.type filesystem
	# storage.pause_on_chunks_overlimit on
	Mem_Buf_Limit 1M

[OUTPUT]
	name opentelemetry
	match *
	storage.total_limit_size 64M
	host xxx.xxx
	port 443
	Metrics_uri /opentelemetry/v1/metrics
	Logs_uri /opentelemetry/v1/logs
	Traces_uri /opentelemetry/v1/traces
	Log_response_payload False
	Retry_Limit False
	tls True
	tls.ca_file /etc/ssl/host.pem
	tls.crt_file /etc/ssl/cert.pem
	tls.key_file /etc/ssl/private/priv.key

  • Environment name and version (e.g. Kubernetes? What version?):
    Native on operating system
  • Server type and version:
    Specialized ARM hardware running fluent-bit along with other daemons.
  • Operating System and version:
    ptxdist Linux 5.4
  • Filters and plugins:
    see config above

Additional context
I have chosen to use retry_limit False, because I need to save OTEL data for a potentially long time, maybe 3-4 weeks. The file system buffering aids me in this, but if I don't set retry_limit False, telemetry data could be deleted just because of unstable connection or no connections at all. The vessel containing the ARM device could be out of ISP for "long" time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant