Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove protocol version check in the broker #14204

Closed
Tracked by #13235
koevskinikola opened this issue Sep 7, 2023 · 0 comments · Fixed by #14205
Closed
Tracked by #13235

Remove protocol version check in the broker #14204

koevskinikola opened this issue Sep 7, 2023 · 0 comments · Fixed by #14205
Assignees
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog target:8.3 Issue must be completed before this target release version:8.2.14 Marks an issue as being completely or in parts released in 8.2.14 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0

Comments

@koevskinikola
Copy link
Member

koevskinikola commented Sep 7, 2023

Description

Zeebe is unable to bump the SBE protocol version due to the following problem. During a rolling upgrade:

  1. Broker 1 is upgraded and comes back online.
  2. Broker 2 is being upgraded
    1. While broker 2 is upgraded, a leader election is triggered
    2. Broker 2's engine starts replaying events
    3. Broker 1 sends event records with a higher protocol version
    4. Broker 2 throws an error that it can't handle these events and crashes

The problem is that we're misusing the SBE protocol version:

  • SBE uses the protocol version to be able to handle (serialize/deserialize) new fields from the later protocol version.
    • By not bumping the protocol version when adding new fields, we're making it difficult for SBE to work properly
  • The Zeebe stream platform (mis)uses the protocol version to check if it will be able to (semantically) understand what was deserialized.
    • This was done to guard against a change in the content of the SBE fields, not against adding new fields (which SBE will easily handle, by ignoring them on older versions).
    • This was also done to avoid changes causing bugs on older brokers

Currently, Zeebe has a metadata property recordVersion which we currently use to differentiate events when they should be applied differently. I.e. an event is something that happened in the past. It describes a state change and should always produce that state change when replayed. Even when its a bug. When we want to fix a bug in an event applier, we need to change the command processor to append events of a new recordVersion and add a new event applier version that fixes the bug.

Solution

The addition of the recordVersion mechanism described above makes the stream-platform protocol version check redundant (and as described above as well, error-prone). As a result, it can be removed.

AC

The protocol version check is removed.

Additional context

@koevskinikola koevskinikola self-assigned this Sep 7, 2023
@koevskinikola koevskinikola changed the title Remove protocol version check in the broke Remove protocol version check in the broker Sep 7, 2023
@koevskinikola koevskinikola added kind/bug Categorizes an issue or PR as a bug kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed kind/bug Categorizes an issue or PR as a bug labels Sep 7, 2023
@koevskinikola koevskinikola added the target:8.3 Issue must be completed before this target release label Sep 7, 2023
zeebe-bors-camunda bot added a commit that referenced this issue Sep 12, 2023
14267: [Backport stable/8.2] Remove protocol version check from stream platform r=github-actions[bot] a=backport-action

# Description
Backport of #14205 to `stable/8.2`.

relates to #14204
original author: `@koevskinikola`

Co-authored-by: Nikola Koevski <nikola.koevski@camunda.com>
@abbasadel abbasadel added the version:8.2.14 Marks an issue as being completely or in parts released in 8.2.14 label Sep 27, 2023
@megglos megglos added the version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0 label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog target:8.3 Issue must be completed before this target release version:8.2.14 Marks an issue as being completely or in parts released in 8.2.14 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants