-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FluentD log buffer not being processed properly #271
Comments
maybe related: |
@linwalth as first aim it is helpful to reduce debug logging (keystone), for longterm we must observe it. possible it is required to restart fluentd container regular |
With some help from the Monitoring SIG (specifically https://github.com/nerdicbynature) i was able to figure out a config that has now run 3 weeks without trouble. I am going to post it here, if someone runs into similar problems. This issue can be closed.
|
Let's try to change the upstream configuration with https://review.opendev.org/c/openstack/kolla-ansible/+/856241. |
Could you please provide details?
|
Moin, the modification addresses multiple issues: request_timeout needs to match ulk_message_request_threshold. HTTP-POST takes longer for a bigger ulk_message_request_threshold, hence the timeout should be significant higher than the usual upload time to ES. In our case 15MB usually need about 5 seconds, but sometimes need 15s. retry_max_interval: Fluentd uses exponential backoff. If target ES has been configured to enforce an incoming rate limit, a series of failed HTTP-Uploads (maybe due to (1)) may lead to a buffer size, that always tops the rate limit and Fluentd does not recover from it. retry_forever/reload_connections/reload_after: Fluentd sometimes silently drops the connection and gets stuck without any obvious reason. This params may help to reduce that. But actually does not prevengt Fluentd from getting stuck. Maybe it's a false assumption. Reloading connections may be a good idea tho. Kind regards, |
Is this still being pursued? |
We now have a new Fluentd version. I'm closing this because I think it's no longer relevant. |
Rolling out FluentD per osism-kolla common role results in the fluentD building up log buffers, but not properly reducing them by sending them to the ES. Instead, buffer files keep building up unseemingly in the fluentd-data volume.
Restarting the FluentD helps for a minute, raising the transmission rate to ES, but then it gets stuck again.
Internally, the container uses up 100% of CPU on the FluentD process. We experimented by raising the thread number for the process to 8 threads, which lowers CPU usage, but does not meaningfully change transmission rate. Instead, the buffer keeps growing and creating more files.
A potential cause would be fluent/fluentd#3817 or uken/fluent-plugin-elasticsearch#909 but then again i would suspect more people than just us running into this problem with kolla.
The text was updated successfully, but these errors were encountered: