Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalStateException: Not expected to have an active sequence flow count lower then zero! #6778

Closed
saig0 opened this issue Apr 12, 2021 · 10 comments · Fixed by #9759
Closed
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:1.3.13 version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@saig0
Copy link
Member

saig0 commented Apr 12, 2021

Describe the bug
On reprocessing, the stream processor ran in an endless loop.

java.lang.IllegalStateException: Not expected to have an active sequence flow count lower then zero!

Followed by

Catched exception class io.zeebe.engine.processing.bpmn.BpmnProcessingException with message 
Expected to take transition to 'ELEMENT_ACTIVATED' but element instance is in state 'SEQUENCE_FLOW_TAKEN'. 
[context: {intent=SEQUENCE_FLOW_TAKEN, elementId=Activity_0v8ebms, bpmnElementType=SUB_PROCESS, 
elementInstanceKey=4503599646176568, flowScopeKey=4503599646085191, processInstanceKey=4503599646085191,
parentProcessInstanceKey=-1, parentElementInstanceKey=-1, bpmnProcessId=BENCH-PERF, processVersion=1, 
processDefinitionKey=2251799813685261}], will retry...

To Reproduce

Not clear.

image

It seems to be related to the termination of an (event-) subprocess.

"Don't process event because of an illegal state transition: Expected element instance to be in state 'ELEMENT_TERMINATED' or one of '[]' but was 'ELEMENT_TERMINATING'. [context: {intent=ELEMENT_TERMINATED, elementId=Activity_0sul25s, bpmnElementType=SUB_PROCESS, elementInstanceKey=4503599646086641, flowScopeKey=4503599646085191, processInstanceKey=4503599646085191, parentProcessInstanceKey=-1, parentElementInstanceKey=-1, bpmnProcessId=BENCH-PERF, processVersion=1, processDefinitionKey=2251799813685261}]"

Expected behavior
This issue is probably caused by the current work in the workflow engine that applies event sourcing to the stream processors.

Log/Stacktrace

Full Stacktrace

java.lang.IllegalStateException: Not expected to have an active sequence flow count lower then zero!
	at io.zeebe.engine.state.instance.ElementInstance.decrementActiveSequenceFlows(ElementInstance.java:168) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.activateElementInstanceInFlowScope(BpmnStateTransitionBehavior.java:245) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.sequenceflow.SequenceFlowProcessor.onSequenceFlowTaken(SequenceFlowProcessor.java:96) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.sequenceflow.SequenceFlowProcessor.onActivating(SequenceFlowProcessor.java:52) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.sequenceflow.SequenceFlowProcessor.onActivating(SequenceFlowProcessor.java:30) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:165) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processRecord(BpmnStreamProcessor.java:106) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.TypedRecordProcessor.processRecord(TypedRecordProcessor.java:54) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.reprocessRecord(ReProcessingStateMachine.java:405) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.lambda$chooseOperationForEvent$7(ReProcessingStateMachine.java:371) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[zeebe-db-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.lambda$processUntilDone$4(ReProcessingStateMachine.java:343) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.retry.EndlessRetryStrategy.run(EndlessRetryStrategy.java:50) ~[zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]

Follow-up Exception

io.zeebe.engine.processing.bpmn.BpmnProcessingException: Expected to take transition to 'ELEMENT_ACTIVATED' but element instance is in state 'SEQUENCE_FLOW_TAKEN'. [context: {intent=SEQUENCE_FLOW_TAKEN, elementId=Activity_0v8ebms, bpmnElementType=SUB_PROCESS, elementInstanceKey=4503599646176568, flowScopeKey=4503599646085191, processInstanceKey=4503599646085191, parentProcessInstanceKey=-1, parentElementInstanceKey=-1, bpmnProcessId=BENCH-PERF, processVersion=1, processDefinitionKey=2251799813685261}]
	at io.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.verifyTransition(BpmnStateTransitionBehavior.java:175) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.transitionTo(BpmnStateTransitionBehavior.java:161) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.transitionToActivated(BpmnStateTransitionBehavior.java:92) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.container.SubProcessProcessor.lambda$onActivating$1(SubProcessProcessor.java:53) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.Either$Right.ifRightOrLeft(Either.java:238) ~[zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.container.SubProcessProcessor.onActivating(SubProcessProcessor.java:52) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.container.SubProcessProcessor.onActivating(SubProcessProcessor.java:23) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:165) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processRecord(BpmnStreamProcessor.java:106) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.TypedRecordProcessor.processRecord(TypedRecordProcessor.java:54) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.reprocessRecord(ReProcessingStateMachine.java:405) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.lambda$chooseOperationForEvent$7(ReProcessingStateMachine.java:371) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[zeebe-db-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.lambda$processUntilDone$4(ReProcessingStateMachine.java:343) ~[zeebe-workflow-engine-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.retry.EndlessRetryStrategy.run(EndlessRetryStrategy.java:50) ~[zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-1.0.0-alpha4.jar:1.0.0-alpha4]

Error Reporting:

Log: https://console.cloud.google.com/logs/query;query=%0AlogName:%22stdout%22%0Aresource.type%3D%22k8s_container%22%0Aresource.labels.cluster_name%3D%22ultratest%22%0Aresource.labels.namespace_name%3D%2299936d0e-e21b-493c-8b96-6d79ac11c40d-zeebe%22%0Aresource.labels.container_name%3D%22zeebe%22;timeRange=2021-04-12T10:06:14.712Z%2F2021-04-12T11:06:14.712Z;pinnedLogId=2021-04-12T10:35:15.759906Z%2F32bhcwg42mromx;cursorTimestamp=2021-04-12T10:35:15.789511Z?project=camunda-cloud-240911

Environment:

  • OS: Camunda Cloud (ultratest)
  • Zeebe Version: 1.0.0-alpha4
  • Configuration:
@saig0 saig0 added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog Impact: Availability severity/mid Marks a bug as having a noticeable impact but with a known workaround labels Apr 12, 2021
@saig0 saig0 closed this as completed Apr 12, 2021
@saig0
Copy link
Member Author

saig0 commented Apr 12, 2021

It is expected that this issue will be solved when the migration is done because it replaces the whole reprocessing by event applying.

@Zelldon
Copy link
Member

Zelldon commented Jul 6, 2022

Seems to happen again with https://console.cloud.google.com/errors/detail/CJG2u-KVh-_LigE;service=zeebe;version=8.0.2;time=P7D?project=camunda-cloud-240911

"Expected to process record 'TypedEventImpl{metadata=RecordMetadata{recordType=COMMAND, intentValue=255, intent=ACTIVATE_ELEMENT, requestStreamId=-2147483648, requestId=-1, protocolVersion=3, valueType=PROCESS_INSTANCE, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=8.0.2}, value={"bpmnProcessId":"epp-processing","version":46,"processDefinitionKey":2251799814498262,"processInstanceKey":4503599627984972,"elementId":"Gateway_04dcjj2","flowScopeKey":4503599627984972,"bpmnElementType":"PARALLEL_GATEWAY","parentProcessInstanceKey":-1,"parentElementInstanceKey":-1}}' without errors, but exception occurred with message 'Not expected to have an active sequence flow count lower then zero!'."

I will try to get more data

@Zelldon Zelldon reopened this Jul 6, 2022
@Zelldon Zelldon reopened this Jul 6, 2022
@camunda camunda deleted a comment from Zelldon Jul 6, 2022
@Zelldon
Copy link
Member

Zelldon commented Jul 6, 2022

It seems that the error was produced by an model where the gateway was taken multiple times (at least twice).

Since this is confidential data I put it into our gdrive https://drive.google.com/drive/u/0/folders/1NIgMxfWjZ5T16mDQ9DEdHMpr1SVg0w9F

@remcowesterhoud
Copy link
Contributor

I have added a version of the process to the drive that doesn't require any variables, for easier debugging.

\cc @saig0

@remcowesterhoud
Copy link
Contributor

I have also just added a none_event_repro file. Creating an instance of this will result in the exception.

@remcowesterhoud
Copy link
Contributor

Hypothesis
I have created a minimalistic process to reproduce the issue:
image

It all has to do with when we reset the number of taken sequence flows. This happens in the completed applier. This in combination with activating the parallel gateway through a command causes this behaviour.

When starting the process the following happens:

  1. We take sequence flow 1, 2 and 3
  2. 1 arrives at the joining gateway. The number of taken sequence flows is 1
  3. 2 and 3 arrive at the exclusive gateway. Both taking sequence flow 4.
  4. The first sequence flow 4 arrives at the joining gateway. The number of taken sequence flows is 2. This is equal to the amount of incoming sequence flow so we send an activate command.
  5. The second sequence flow 4 arrives at the joining gateway. The 1st command has not yet been processed. The joining gateway is not yet activated and completed, thus the number of taken sequence flows has not been reset! Another command to activate the joining gateway is written.
  6. The first activate command is handled. The number of active sequence flows is decremented by 2 (the amount of incoming sequence flows). This is fine as we have 3 active sequence flows at this stage.
  7. The second command is handled. The number of active sequence flows is decremented by 2 again. This results in an exception as there only is 1 more active sequence flow.

@saig0
Copy link
Member Author

saig0 commented Jul 7, 2022

@remcowesterhoud great analysis 👍 I can confirm the behavior ✔️

@Zelldon
Copy link
Member

Zelldon commented Jul 7, 2022

In my opinion, this bug is critical (at least high severity), there is no work-around and the process instances are stuck in this case. Plus: As a SaaS user it is hard to detect what went wrong.

@remcowesterhoud remcowesterhoud self-assigned this Jul 7, 2022
@remcowesterhoud
Copy link
Contributor

remcowesterhoud commented Jul 7, 2022

I would argue there is a workaround, which is fixing the model. Strictly looking at the labels that means mid would fit the best: "Marks a bug as having a noticeable impact but with a known workaround". I'm still happy to fix it right now as I don't think it'll be a difficult thing.

@Zelldon
Copy link
Member

Zelldon commented Jul 7, 2022

I would argue there is a workaround, which is fixing the model.

Well but this doesn't help you if your instances are stuck, like real payments or transactions right?

@remcowesterhoud remcowesterhoud added severity/high Marks a bug as having a noticeable impact on the user with no known workaround and removed severity/mid Marks a bug as having a noticeable impact but with a known workaround labels Jul 8, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Jul 15, 2022
9759: Reject duplicate parallel gateway activate command r=remcowesterhoud a=remcowesterhoud

## Description

<!-- Please explain the changes you made here. -->
Parallel gateways get activated by taking a sequence flow and checking if the number of taken flows is greater or equal to the number of incoming sequence flows. If this is the case an activate command is sent. The number of taken sequence flows get rest upon activation of the parallel gateway.

This proves troublesome when a "bad" model causes one of the incoming sequence flows to be taken twice. This could result in the activation command being sent twice. Imagine there is a parallel gateway with 2 incoming flows. What would happen is:

1. First flow is taken
2. Second flow is taken. Incoming flows == taken flows so an activate command is sent.
3. Second flow is taken again. The first activate command has not been processed yet. The number of taken flows has not been reset. As a result incoming flows < taken flows. A second activate command is sent.

This is solved by always sending an activate command when a sequence flow is taken. Once the `BpmnStreamProcessor` tries to process the record it will check if the state is valid. Here a check has been added to verify that when we receive an activate command for a parallel gateway we will first check if all the incoming flows have been taken. If this is not the case we will reject the command.

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #6778 



Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9822: [Backport stable/1.3] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/1.3`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9823: [Backport stable/8.0] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/8.0`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9822: [Backport stable/1.3] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/1.3`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9823: [Backport stable/8.0] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/8.0`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9823: [Backport stable/8.0] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/8.0`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 20, 2022
9823: [Backport stable/8.0] Reject duplicate parallel gateway activate command r=remcowesterhoud a=backport-action

# Description
Backport of #9759 to `stable/8.0`.

relates to #6778

Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:1.3.13 version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants