Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: Avoid to process discarded chunks in write_step_by_step #4342

Merged
merged 3 commits into from Dec 14, 2023

Conversation

ashie
Copy link
Member

@ashie ashie commented Nov 15, 2023

Which issue(s) this PR fixes:
Fix #3089

What this PR does / why we need it:
It fixes following error when many chunk bytes limit exceeds errors are occurred:

2020-07-28 14:59:26 +0000 [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'" tag="cafiscode-eks-cluster.default"
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `rollback'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:339:in `rescue in block in write'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:332:in `block in write'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:331:in `each'
  ...

Please see #3089 (comment) for the detail.

Docs Changes:
N/A

Release Note:
Same with the title

@ashie ashie mentioned this pull request Nov 15, 2023
@daipom daipom added the bug label Nov 27, 2023
@ashie ashie modified the milestone: v1.16.3 Nov 30, 2023
@ashie ashie added this to the v1.16.4 milestone Dec 8, 2023
@ashie
Copy link
Member Author

ashie commented Dec 8, 2023

It would be better to add a test for it.

It fixes following error when many `chunk bytes limit exceeds` errors
are occurred:
```
2020-07-28 14:59:26 +0000 [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'" tag="cafiscode-eks-cluster.default"
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer/file_chunk.rb:82:in `rollback'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:339:in `rescue in block in write'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:332:in `block in write'
  2020-07-28 14:59:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.1/lib/fluent/plugin/buffer.rb:331:in `each'
  ...
```

Fix #3089

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
@ashie ashie force-pushed the fix-issue-3089 branch 5 times, most recently from 415c963 to 5265fb5 Compare December 12, 2023 10:02
@ashie
Copy link
Member Author

ashie commented Dec 12, 2023

It would be better to add a test for it.

Done.

@ashie ashie requested review from kenhys and daipom December 12, 2023 10:10
@ashie ashie force-pushed the fix-issue-3089 branch 3 times, most recently from b2c65b9 to 7e981ce Compare December 12, 2023 12:29
#3089

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
@daipom
Copy link
Contributor

daipom commented Dec 13, 2023

Thanks! I'm seeing this.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.
I have commented on my concern about the code, but I think it can go either way.
(It would be more readable to make get_next_chunk() just focus on returning the next chunk, I think.)

Now I understand this bug.
The problem is that the chunks remains in operated_chunks of Buffer::write() after retrying write_step_by_step().
And this PR fixes it completely.

It seems that this also causes the following error.

lib/fluent/plugin/buffer.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/buffer.rb Outdated Show resolved Hide resolved
Signed-off-by: Takuro Ashie <ashie@clear-code.com>
@ashie ashie merged commit 95438b2 into master Dec 14, 2023
13 of 16 checks passed
@ashie ashie deleted the fix-issue-3089 branch December 14, 2023 01:18
@ashie
Copy link
Member Author

ashie commented Dec 14, 2023

Need to backport to v1.16 branch.

daipom added a commit to daipom/fluentd that referenced this pull request Mar 26, 2024
After 95438b2 (fluent#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    emit transaction failed: error_class=IOError error=
    "closed stream" location=...

    send an error event stream to @error: error_class=IOError
    error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit to daipom/fluentd that referenced this pull request Mar 26, 2024
After 95438b2 (fluent#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

```
[warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
[warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...
```

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit to daipom/fluentd that referenced this pull request Mar 26, 2024
After 95438b2 (fluent#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to \@error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit to daipom/fluentd that referenced this pull request Mar 26, 2024
After 95438b2 (fluent#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit to daipom/fluentd that referenced this pull request Mar 26, 2024
After 95438b2 (fluent#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit that referenced this pull request Mar 27, 2024
After 95438b2 (#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit that referenced this pull request Mar 27, 2024
Backported from 13a5199.

After 95438b2 (#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom added a commit that referenced this pull request Mar 27, 2024
Backported from 13a5199.

After 95438b2 (#4342), there is a
section where chunks do not have a lock in `write_step_by_step()`.

`write_step_by_step()` must ensure their locks until passing them
to the block.
Otherwise, race condition can occur and it can cause emit error
by IOError.
Example of warning messages of emit error:

    [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location=...
    [warn]: #0 send an error event stream to @error: error_class=IOError error="closed stream" location=...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IOError at chunk.rollback
2 participants