Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent Bit occasionally corrupts/truncates log entries when processing multiple log files. #8798

Open
avizen-j opened this issue May 6, 2024 · 1 comment
Labels
status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes

Comments

@avizen-j
Copy link

avizen-j commented May 6, 2024

Bug Report

Describe the bug
Fluent Bit occasionally corrupts/truncates log entries when processing multiple log files. To be sure, that issue does not occur on Elasticsearch side where logs end up, 2 different outputs were set up to prove that something is wrong with Fluentbit. Moreover, when resending same log twice (locally), it is being shipped properly, so it seems that Fluentbit does not cope well when there are multiple log files.

Original log from the file:

{"@timestamp":"2024-05-06T11:05:59.5862062+00:00","level":"Debug","messageTemplate":"The request is insecure. Skipping HSTS header.","message":"The request is insecure. Skipping HSTS header.","fields":{"EventId":{"Id":1,"Name":"NotSecure"},"SourceContext":"Microsoft.AspNetCore.HttpsPolicy.HstsMiddleware","RequestId":"HIDDEN","RequestPath":"HIDDEN","ConnectionId":"HIDDEN","LoggingEnvironment":"test","ApplicationName":"HIDDEN","ServerName":"HIDDEN","Product":"HIDDEN"}}

Log from http output (elasticsearch v 8.8.1):

{"@timestamp":"2024-05-06T11:05:59.5862062+00:00","level":"Debug","messageTemplate":"The request is insecu"Id":2},"SourceContext":"Microsoft.AspNetCore.Hosting.Diagnostics","RequestId":"HIDDEN","RequestPath":"HIDDEN","ConnectionId":"HIDDEN","LoggingEnvironment":"test","ApplicationName":"HIDDEN","ServerName":"HIDDEN","Product":"HIDDEN"}}

Log from es output (elasticsearch v 8.12.2):

{"@timestamp":"2024-05-06T11:05:59.5862062+00:00","level":"Debug","messageTemplate":"The request is insecu"Id":2},"SourceContext":"Microsoft.AspNetCore.Hosting.Diagnostics","RequestId":"HIDDEN","RequestPath":"HIDDEN","ConnectionId":"HIDDEN","LoggingEnvironment":"test","ApplicationName":"HIDDEN","ServerName":"HIDDEN","Product":"HIDDEN"}}

For some reason, the log is corrupted/truncated in this place "messageTemplate":"The request is insecu"Id":2}.

Context
Currently Fluentbit is deployed as a Deployment in Kubernetes environment. 1 Fluentbit instance per 1 namespace. In namespace there are multiple apps that write their logs to file. Fluentbit watches around 51 files (but might be up to 150). The evidence is below:

...
[2024/05/06 11:33:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1107501554002 watch_fd=49 name=/opt/app-root/app-logs/app1/log20240506.txt
[2024/05/06 11:33:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1107424670288 watch_fd=50 name=/opt/app-root/app-logs/app2/log20240429.txt
[2024/05/06 11:33:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1106717528779 watch_fd=51 name=/opt/app-root/app-logs/app3/log20240506.txt

Your Environment

  • Version used: fluent-bit:2.2.2-debug
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes
  • Deployed as: kind: Deployment
  • Filters and plugins: Using the tail input plugin with a custom dockerjson parser, the modify filter, and both http and es output plugins.

Configuration
fluent-bit.conf:

[SERVICE]
    Parsers_File                        ./custom-parsers.conf
    Log_Level                           info
    Storage.path                        /opt/app-root/data/
    Storage.max_chunks_up               400
    Storage.pause_on_chunks_overlimit   On

# Input to read from a file.
[INPUT]
    Name                tail
    Tag                 product
    Parser              dockerjson
    Path                /opt/app-root/app-logs/*, /opt/app-root/app-logs/*/*, /opt/app-root/app-logs/*/*/*, /opt/app-root/app-logs/*/*/*/*
    Refresh_Interval    30
    Read_from_Head      On
    Skip_Long_Lines     On
    Skip_Empty_Lines    On
    Buffer_Max_Size     2M
    DB                  /opt/app-root/data/fluentbit.db
    Storage.type        filesystem

# Filter to add any additional fields with values to each log that will be sent.
[FILTER]
    Name                modify
    Match               product
    Add                 environment ${ENVIRONMENT}

# First output
[OUTPUT]
    Name                        http
    Match                       product
    Host                        ${OUTPUT_1_HOST}
    URI                         /product
    Port                        443
    Http_User                   ${OUTPUT_1_USER}
    Http_Passwd                 ${OUTPUT_1_PASSWORD}
    Tls                         On
    Tls.verify                  Off
    Format                      json
    Retry_Limit                 10
    Storage.total_limit_size    900M

# Second output.
[OUTPUT]
    Name                        es
    Match                       product
    Host                        ${OUTPUT_2_HOST}
    Port                        9200
    Http_User                   ${OUTPUT_2_USER}
    Http_Passwd                 ${OUTPUT_2_PASSWORD}
    Index                       fluent-bit
    Tls                         On
    Tls.verify                  Off
    Suppress_Type_Name          On
    Retry_Limit                 10
    Storage.total_limit_size    900M

custom-parsers.conf:

[PARSER]
    Name         dockerjson
    Format       json
    Time_Key     timestamp
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder      | Field | Optional Action
    # =============|===============|=======|=========
    Decode_Field_As  escaped    message   
    Decode_Field_As  escaped    messageTemplate
@avizen-j avizen-j changed the title Logs are being corrupted/truncated in Kubernetes [tail input, es & http output]. Fluent Bit occasionally corrupts/truncates log entries when processing multiple log files. May 6, 2024
@lecaros
Copy link
Contributor

lecaros commented May 11, 2024

can you add an stdout output and verify the format?

@lecaros lecaros added the waiting-for-user Waiting for more information, tests or requested changes label May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

2 participants