Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecondaryFileOutput has race condition issues in multiple threads or workers #4080

Closed
daipom opened this issue Mar 3, 2023 · 1 comment · Fixed by #4081
Closed

SecondaryFileOutput has race condition issues in multiple threads or workers #4080

daipom opened this issue Mar 3, 2023 · 1 comment · Fixed by #4081
Assignees

Comments

@daipom
Copy link
Contributor

daipom commented Mar 3, 2023

Describe the bug

SecondaryFileOutput sometimes appends logs even when append false in multiple threads or workers

Since SecondaryFileOutput does not use any lock, race condition can occur.

Especially, this logic causes race condition issues in multiple threads or workers when append false.

def generate_path(path_without_suffix)
if @append
"#{path_without_suffix}#{@suffix}"
else
i = 0
loop do
path = "#{path_without_suffix}.#{i}#{@suffix}"
return path unless File.exist?(path)
i += 1
end
end
end

To Reproduce

Run the fluentd with the following config for about 1 minute.

Somehow make out_file plugin's output error.
I fixed the code for debugging.

--- a/lib/fluent/plugin/out_file.rb
+++ b/lib/fluent/plugin/out_file.rb
@@ -200,6 +200,7 @@ module Fluent::Plugin
     end
 
     def write(chunk)
+      raise IOError, "for debug"
       path = extract_placeholders(@path_template, chunk)
       FileUtils.mkdir_p File.dirname(path), mode: @dir_perm
bundle exec fluentd -c fluent.conf

Then we can find some files where logs with different tags are mixed in the same file although the chunk-key was tag and append was false.

2023-03-03T19:53:28+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:29+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:30+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:31+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:32+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:33+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:34+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:35+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:36+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:37+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:38+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:28+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:29+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:30+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:31+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:32+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:33+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:34+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:35+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:36+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:37+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:38+09:00	aaa.fuga.bbb	{"message":"fuga"}

Expected behavior

Each chunk is separated into a file.

File-A

2023-03-03T19:53:28+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:29+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:30+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:31+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:32+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:33+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:34+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:35+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:36+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:37+09:00	aaa.hoge.bbb	{"message":"hoge"}
2023-03-03T19:53:38+09:00	aaa.hoge.bbb	{"message":"hoge"}

File-B

2023-03-03T19:53:28+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:29+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:30+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:31+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:32+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:33+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:34+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:35+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:36+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:37+09:00	aaa.fuga.bbb	{"message":"fuga"}
2023-03-03T19:53:38+09:00	aaa.fuga.bbb	{"message":"fuga"}

Your Environment

- Fluentd version: fluentd 1.15.3
- Operating system: Ubuntu 20.04.5 LTS (Focal Fossa)
- Kernel version: 5.15.0-60-generic

Your Configuration

<system>
  workers 4
</system>

<source>
  @type sample
  tag aaa.hoge.bbb
  sample {"message": "hoge"}
</source>

<source>
  @type sample
  tag aaa.fuga.bbb
  sample {"message": "fuga"}
</source>

<source>
  @type sample
  tag aaa.foo.bbb
  sample {"message": "foo"}
</source>

<match aaa.**>
  @type file
  path /test/fluentd/log/${tag}/fluentd.log
  append true
  <buffer tag>
    @type file
    path /test/fluentd/buffer
    flush_mode interval
    flush_interval 10s
    retry_max_times 1
  </buffer>
  <secondary>
    @type secondary_file
    directory /test/fluentd/error/
    append false
  </secondary>
</match>

Your Error Log

Error logs are not related to this issue.

Additional context

No response

@daipom daipom self-assigned this Mar 3, 2023
@daipom daipom changed the title SecondaryFileOutput sometimes appends logs even when append false in multiple workers SecondaryFileOutput has race condition issues in multiple workers Mar 3, 2023
@daipom
Copy link
Contributor Author

daipom commented Mar 6, 2023

Additional tests.

Environment

  • Fluentd version: fluentd 1.15.3
  • Operating system: Ubuntu 20.04.5 LTS (Focal Fossa)
  • Kernel version: 5.15.0-60-generic
  • Ruby: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
  • CPU: 11th Gen Intel® Core™ i7-11700 @ 2.50GHz × 16
  • RAM: 31.1 GiB

Fix out_file for tests

--- a/lib/fluent/plugin/out_file.rb
+++ b/lib/fluent/plugin/out_file.rb
@@ -200,6 +200,7 @@ module Fluent::Plugin
     end
 
     def write(chunk)
+      raise IOError, "for tests"
       path = extract_placeholders(@path_template, chunk)
       FileUtils.mkdir_p File.dirname(path), mode: @dir_perm

Test: append false

  1. Clear the buffer files.
  2. Run Fluentd until out_secondary_file outputs about 20~30 files.
  3. Check the percentage of files with mixed records of multiple tags by the following Ruby script.

The Ruby script to check the percentage:

#!/usr/bin/env ruby

require "pathname"

target_dir = Pathname(ARGV[0])

mixed_file_num = 0
entire_file_num = 0

target_dir.each_child do |filepath|
  entire_file_num += 1
  mixed_file_num += 1 if filepath.readlines.uniq.size > 1
end

percentage = 100 * mixed_file_num.to_f / entire_file_num

puts "#{percentage.round(1)}% (#{mixed_file_num}/#{entire_file_num})"

Config:

  • multiple threads
<source>
  @type sample
  tag test.hoge
  sample {"message": "hoge"}
</source>

<source>
  @type sample
  tag test.fuga
  sample {"message": "fuga"}
</source>

<source>
  @type sample
  tag test.foo
  sample {"message": "foo"}
</source>

<match test.**>
  @type file
  path /test/fluentd/log/${tag}/fluentd.log
  <format>
    @type single_value
  </format>
  <buffer tag>
    @type file
    path /test/fluentd/buffer
    flush_thread_count 16
    flush_mode interval
    flush_interval 10s
    retry_max_times 1
  </buffer>
  <secondary>
    @type secondary_file
    directory /test/fluentd/error/
    append false
  </secondary>
</match>
  • multiple workers
<system>
  workers 3
</system>

<worker 0>
  <source>
    @type sample
    tag test.hoge
    sample {"message": "hoge"}
  </source>
</worker>

<worker 1>
  <source>
    @type sample
    tag test.fuga
    sample {"message": "fuga"}
  </source>
</worker>

<worker 2>
  <source>
    @type sample
    tag test.foo
    sample {"message": "foo"}
  </source>
</worker>

<match test.**>
  @type file
  path /test/fluentd/log/${tag}/fluentd.log
  <format>
    @type single_value
  </format>
  <buffer tag>
    @type file
    path /test/fluentd/buffer
    flush_mode interval
    flush_interval 10s
    retry_max_times 1
  </buffer>
  <secondary>
    @type secondary_file
    directory /test/fluentd/error/
    append false
  </secondary>
</match>

Result

workers flush_thread_count percentage (1st) percentage (2nd)
1 16 54.2% (13/24) 32.3% (10/31)
3 1 32.5% (13/40) 12.5% (3/24)

append true

It seems that the same race condition can occur as

However, the timing is too severe to reproduce it.

In #3805, we could adopt the way of using flush_at_shutdown and make the plugin output chunks at the same time, but it is difficult in this case.

I tried the following way, but I couldn't reproduce it.

<system>
  workers 8
</system>

<source>
  @type sample
  tag test.hoge
  sample {"message": "hogehogehogehogehogehogehogehogehogehoge"}
  rate 100
</source>

<source>
  @type sample
  tag test.fuga
  sample {"message": "fugafugafugafugafugafugafugafugafugafuga"}
  rate 100
</source>

<source>
  @type sample
  tag test.foo
  sample {"message": "foofoofoofoofoofoofoofoofoofoo"}
  rate 100
</source>

<match test.**>
  @type file
  path /test/fluentd/log/${tag}/fluentd.log
  <format>
    @type single_value
  </format>
  <buffer tag>
    @type file
    path /test/fluentd/buffer
    flush_thread_count 16
    flush_mode interval
    flush_interval 1s
    flush_thread_interval 0.1
    flush_thread_burst_interval 0.1
    retry_max_times 1
  </buffer>
  <secondary>
    @type secondary_file
    directory /test/fluentd/error/
    append true
  </secondary>
</match>

@daipom daipom changed the title SecondaryFileOutput has race condition issues in multiple workers SecondaryFileOutput has race condition issues in multiple threads and workers Mar 6, 2023
@daipom daipom changed the title SecondaryFileOutput has race condition issues in multiple threads and workers SecondaryFileOutput has race condition issues in multiple threads or workers Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant