Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple triggered interrupting boundary events can deadlock process instance #9233

Closed
korthout opened this issue Apr 26, 2022 · 4 comments · Fixed by #9281
Closed

Multiple triggered interrupting boundary events can deadlock process instance #9233

korthout opened this issue Apr 26, 2022 · 4 comments · Fixed by #9281
Assignees
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog version:1.3.9 version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@korthout
Copy link
Member

korthout commented Apr 26, 2022

Describe the bug

When multiple interrupting boundary events are triggered simultaneously for a process instance, the process instance may not be able to finish terminating. Such an instance can no longer be canceled from outside either.

This was discovered in a scenario with a parent process that calls a child process that in turn calls another child process (see To-Reproduce section). Each call activity has an interrupting message boundary event that subscribes to the same message (i.e. the same message name and correlation key). When the message is published both call activities are simultaneously interrupted and terminated. However, this can lead to a deadlock in the termination logic:

The child cannot complete the message boundary event and take the sequence flow because its flow scope (the called instance) is set to terminating by the other boundary event. But the called instance cannot terminate because there is still an active flow.

Note that there may exist other ways to hit this bug using other events than messages (likely) and perhaps it also exists with nested embedded subprocesses (unlikely). I have not tested these cases.

To Reproduce

  • Deploy these 3 processes: Processes.zip
  • Create an instance zbctl create instance Level_1
  • Wait for the user task to be active (we just need some wait state)
  • Publish a message that correlates to both the interrupting message boundary events (one in Level_1 and one in Level_2): zbctl --insecure publish message msg --correlationKey='msg'
  • Check the log for rejected complete_element command: zdb log print -p=/tmp/data/raft-partition/partitions/1/ | jq '.records[].entries[]? | select(.recordType == "COMMAND_REJECTION") | select(.intent == "COMPLETE_ELEMENT")'

Expected behavior

The process instance should terminate.

Log/Stacktrace

Full Stacktrace

{
  "partitionId": 1,
  "value": {
    "version": 1,
    "parentElementInstanceKey": 2251799813685394,
    "parentProcessInstanceKey": 2251799813685390,
    "processDefinitionKey": 2251799813685387,
    "elementId": "Event_1pi5foh",
    "bpmnProcessId": "Level_2",
    "processInstanceKey": 2251799813685396,
    "flowScopeKey": 2251799813685396,
    "bpmnElementType": "BOUNDARY_EVENT"
  },
  "key": 2251799813685411,
  "timestamp": 1650977539258,
  "intent": "COMPLETE_ELEMENT",
  "position": 534,
  "valueType": "PROCESS_INSTANCE",
  "recordType": "COMMAND_REJECTION",
  "rejectionType": "INVALID_STATE",
  "rejectionReason": "Expected flow scope instance to be in state 'ELEMENT_ACTIVATED' but was 'ELEMENT_TERMINATING'.",
  "brokerVersion": "1.3.6",
  "sourceRecordPosition": 532
}

Environment:

  • Zeebe Version: 1.3.6 (untested in newer versions)
@korthout korthout added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) team/process-automation labels Apr 26, 2022
@Zelldon
Copy link
Member

Zelldon commented Apr 26, 2022

Love that you use zdb @korthout :)

I guess this also applies to interrupting event sub processes.

@korthout
Copy link
Member Author

I guess this also applies to interrupting event sub processes.

Perhaps

@menski
Copy link
Contributor

menski commented May 2, 2022

Let's look into a bug fix for this issue, and create a follow-up issue for the bigger scope of considering our termination logic, to "quicken" the termination process as discussed in the team meeting, i.e. preventing new "actions" to happen in a process instance.

@saig0
Copy link
Member

saig0 commented May 3, 2022

Analyze

It is a general issue with boundary events. We write the activating and activated events for the boundary event without checking if the flow scope is still active.

I can reproduce the issue if the top level process has an interrupting boundary event on the call activity or an interrupting event subprocess.

image

zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9292: [Backport 8.0] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

No additional changes compared to the origin PR.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9292: [Backport 8.0] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

No additional changes compared to the origin PR.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 5, 2022
9293: [Backport 1.3] Trigger boundary events only if the flow scope is active r=saig0 a=saig0

## Description

Backport of #9281

Only one additional change compared to the origin PR for downgrading the test code to Java 11.

## Related issues

relates to #9233


Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog version:1.3.9 version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants