Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless loop in deleting #8103

Closed
Zelldon opened this issue Nov 1, 2021 · 7 comments
Closed

Endless loop in deleting #8103

Zelldon opened this issue Nov 1, 2021 · 7 comments
Labels
area/observability Marks an issue as observability related kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/low Marks a bug as having little to no noticeable impact for the user

Comments

@Zelldon
Copy link
Member

Zelldon commented Nov 1, 2021

We can have an endless loop in deleting:

�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.668 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.669 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.670 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.671 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.672 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.672 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.673 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)
�[36mzeebe_broker2 |�[0m 2021-10-27 14:14:18.675 [] [raft-server-2-raft-partition-partition-5] DEBUG
�[36mzeebe_broker2 |�[0m       io.camunda.zeebe.journal.file.SegmentedJournal - No segments can be deleted with index < 49499482 (first log index: 49463401)

Originally posted by @Zelldon in #8099 (comment)

@Zelldon
Copy link
Member Author

Zelldon commented Nov 1, 2021

@deepthidevaki found the cause #8099 (comment) 🙇

@npepinpe npepinpe added scope/broker Marks an issue or PR to appear in the broker section of the changelog kind/bug Categorizes an issue or PR as a bug labels Nov 1, 2021
@npepinpe
Copy link
Member

npepinpe commented Nov 1, 2021

What's the impact on the cluster? Do we have any idea? Would it recover?

@Zelldon
Copy link
Member Author

Zelldon commented Nov 1, 2021

Hard to say. I mean it is stuck on OOD, with the endless loop it blocks useful resources to do snapshots on other partitions I can imagine. Restart might help to get out of the loop, but I think in general the cluster would need manual help, like increasing disk, delete data or something. In general it should not end in OOD in the first place and of course it should not end in an endless loop try to clean up data, where is not possible.

@Zelldon Zelldon added Impact: Availability severity/low Marks a bug as having little to no noticeable impact for the user labels Nov 1, 2021
@deepthidevaki
Copy link
Contributor

It is not an "endless loop" in which the thread is stuck for ever in this operation. Whenever follower receives an AppendRequest and it tries to write to the journal, it received OOD exception, and then it tries to "compact", which does nothing. So it is mostly just a wasteful work, and annoying logs. It will recover if the broker recovers from OOD.

@Zelldon
Copy link
Member Author

Zelldon commented Nov 1, 2021

Thanks @deepthidevaki for clarifying this, haven't checked the code before.

@npepinpe
Copy link
Member

npepinpe commented Nov 2, 2021

Let's remove the compact behavior here, as the follower can only compact on a new snapshot anyway.

@npepinpe npepinpe added this to Planned in Zeebe Nov 2, 2021
@KerstinHebel KerstinHebel removed this from Planned in Zeebe Mar 23, 2022
@npepinpe npepinpe added area/observability Marks an issue as observability related and removed Impact: Observability labels Apr 11, 2022
@Zelldon
Copy link
Member Author

Zelldon commented Dec 29, 2022

Never occurred again I will close this for now

@Zelldon Zelldon closed this as completed Dec 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/low Marks a bug as having little to no noticeable impact for the user
Projects
None yet
Development

No branches or pull requests

5 participants