IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

pihme · 2022-01-21T07:29:56Z

Describe the bug
Observed in production log.
A) https://console.cloud.google.com/errors/detail/COfpgtnP4c-E0QE;service=zeebe;time=P7D?project=camunda-cloud-240911
Logs: https://drive.google.com/file/d/1DF-Kl0TXV5LKT6e2Nnk7Gk8YsdE926Cn/view?usp=sharing
and
B) https://console.cloud.google.com/errors/detail/CJmzuouvuYOk5wE;service=zeebe;time=P7D?project=camunda-cloud-240911
Logs: https://drive.google.com/file/d/1bgGiULD6bNzvuY0LapW5BQmbuDZYW2Lr/view?usp=sharing

Log/Stacktrace

Full Stacktrace

 java.lang.IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found.
	at io.camunda.zeebe.engine.state.deployment.DbProcessState.getFlowElement(DbProcessState.java:294) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnEventSubscriptionBehavior.activateTriggeredEvent(BpmnEventSubscriptionBehavior.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.lambda$transitionToTerminated$7(CallActivityProcessor.java:144) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at java.util.Optional.ifPresentOrElse(Unknown Source) ~[?:?]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.transitionToTerminated(CallActivityProcessor.java:141) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onChildTerminated(CallActivityProcessor.java:133) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onChildTerminated(CallActivityProcessor.java:29) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.lambda$terminateChildProcessInstance$12(BpmnStateTransitionBehavior.java:494) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at java.util.Optional.ifPresentOrElse(Unknown Source) ~[?:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.terminateChildProcessInstance(BpmnStateTransitionBehavior.java:491) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onTerminate(CallActivityProcessor.java:98) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onTerminate(CallActivityProcessor.java:29) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:137) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.lambda$processRecord$0(BpmnStreamProcessor.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.Either$Right.ifRightOrLeft(Either.java:344) ~[zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processRecord(BpmnStreamProcessor.java:107) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.TypedRecordProcessor.processRecord(TypedRecordProcessor.java:54) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.lambda$processInTransaction$2(ProcessingStateMachine.java:263) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[zeebe-db-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.processInTransaction(ProcessingStateMachine.java:253) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.processEvent(ProcessingStateMachine.java:221) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.tryToReadNextEvent(ProcessingStateMachine.java:192) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.readNextEvent(ProcessingStateMachine.java:184) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:95) [zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:192) [zeebe-util-1.3.0.jar:1.3.0]

Environment:

OS:
Zeebe Version: 1.3.0
Configuration:

The text was updated successfully, but these errors were encountered:

pihme · 2022-01-21T08:22:33Z

Summary of affected entities so far:

Affected variables:

migrated-variable-2251799827268084
migrated-variable-2251799826931975
migrated-variable-2251799826931023
migrated-variable-2251799826929599
migrated-variable-2251799826930372
migrated-variable-2251799826928045

Affected processes:

2251799824489065
2251799826929998
2251799825939186

Affected process Instances:

2251799826928418
2251799826927723
2251799826929160
2251799826930041
2251799826930719
2251799826931632
2251799827268073

Stack traces:
A)

	at io.camunda.zeebe.engine.state.deployment.DbProcessState.getFlowElement(DbProcessState.java:294) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnEventSubscriptionBehavior.activateTriggeredEvent(BpmnEventSubscriptionBehavior.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.task.JobWorkerTaskProcessor.lambda$onTerminate$7(JobWorkerTaskProcessor.java:82) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.task.JobWorkerTaskProcessor.onTerminate(JobWorkerTaskProcessor.java:79) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.task.JobWorkerTaskProcessor.onTerminate(JobWorkerTaskProcessor.java:24) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:137) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.lambda$processRecord$0(BpmnStreamProcessor.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]

B)

	at io.camunda.zeebe.engine.state.deployment.DbProcessState.getFlowElement(DbProcessState.java:294) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnEventSubscriptionBehavior.activateTriggeredEvent(BpmnEventSubscriptionBehavior.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.lambda$transitionToTerminated$7(CallActivityProcessor.java:144) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.transitionToTerminated(CallActivityProcessor.java:141) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onChildTerminated(CallActivityProcessor.java:133) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onChildTerminated(CallActivityProcessor.java:29) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.lambda$terminateChildProcessInstance$12(BpmnStateTransitionBehavior.java:494) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnStateTransitionBehavior.terminateChildProcessInstance(BpmnStateTransitionBehavior.java:491) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onTerminate(CallActivityProcessor.java:98) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.container.CallActivityProcessor.onTerminate(CallActivityProcessor.java:29) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:137) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.lambda$processRecord$0(BpmnStreamProcessor.java:110) ~[zeebe-workflow-engine-1.3.0.jar:1.3.0]

pihme · 2022-01-21T09:05:22Z

Grafana looks normal. It's a cluster with a single node.

pihme · 2022-01-21T09:18:20Z

Affected instances are blacklisted.

Other than that, Zeebe keeps running normally.

pihme · 2022-01-21T09:34:03Z

One thing that is peculiar about the affected process instances:

all of them have raised an incident in the past. The incident is not resolved
All of them were started mid December 2021

What I don't understand is why processes with an active incident are showing any activity at all.

korthout · 2022-01-21T09:36:36Z

What I don't understand is why processes with an active incident are showing any activity at all.

Perhaps the process instance is being cancelled?

pihme · 2022-01-21T09:48:52Z

Summary of findings:

Impact for currently known cases: minimal. It is a dev cluster, the processes have been started last year in December

Impact in general: unknown at this point

Timeline of events:

Mid December last year: process instances get started and raise incidents due to wrong variable binding
Recently: Update to 1.3.0
Yesterday: Bug occurred and process instances were subsequently blacklisted

pihme · 2022-01-21T09:50:33Z

Perhaps the process instance is being cancelled?

How can I see that in the logs? Or anywhere else?

korthout · 2022-01-21T09:55:26Z

How can I see that in the logs? Or anywhere else?

Not something we log, but should be available on the log stream. 🤔 I'm not sure if operate would also already show that the instance is terminating entirely, but perhaps you could see it there as well.

EDIT: You can also see it in using zdb as the root process should be in the state ELEMENT_TERMINATING

pihme · 2022-01-21T10:05:51Z

Confirmed that there were cancel requests (we were able to see the requests in Grafana)

pihme · 2022-01-21T10:07:02Z

Timeline of events:

Mid December last year: process instances get started and raise incidents due to wrong variable binding
Recently: Update to 1.3.0
Yesterday: Customer tried to cancel the process instances, this led to this error and subsequently the instances were blacklisted

pihme · 2022-01-21T10:13:58Z

After discussing with @Zelldon we decided to lower the severity.

So far this was only observed for one customer in one dev cluster
There is no indication that it is a general problem of data corruption
All observed instances have in common, that they occurred when trying to cancel a process instance that has an incident
It is more likely that there is a bug in this specific scenario
However, in this specific scenario - cancelling a process instance which has an incident - the outcome of a blacklisted process instance is tolerable from a customers point of view.

korthout · 2022-01-21T10:21:34Z

Curious what the incident type is. We have an existing issue with cancelling instances with incidents of UNHANDLED_ERROR #8588

pihme · 2022-01-21T10:24:53Z

The problem were variables that couldn't be resolved. Looked a bit like a typo in the FEEL expression. But tbh, I didn't pay too much attention to this detail.

npepinpe · 2022-01-25T08:42:01Z

Leaving in the backlog for now - from my understanding, the impact is mostly on visibility as the instance is blacklisted, which is not really visible for users. However, the net effect is what they wanted - users cannot interact with this instance anymore. It's still confusing, but I think this stems mostly from the blacklisting concept and its lack of visibility.

Two things which might cause us to reprioritize:

Blacklisting still leaves some data behind, and if this bug occurs often, then this can be an issue as the state size increases.
~~Can jobs of blacklisted processes be activated anyway?~~ They cannot.

rusterholz · 2022-01-28T21:36:07Z

the impact is mostly on visibility as the instance is blacklisted, which is not really visible for users

As someone who uses Zeebe on its own (without Camunda Cloud), this was confusing to me. I searched on docs.camunda.io but couldn't find any references to this. Could you explain more about blacklisting an instance? Is it a feature of Zeebe, or of Camunda Cloud / Operate?

Zelldon · 2022-01-28T22:23:23Z

There is a open issue to document this camunda/camunda-docs#145 some details can be found here #1988

rusterholz · 2022-01-28T22:38:45Z

Thanks @Zelldon -- the details in #1988 are sparse but I think I get the idea. Hopefully the more formal documentation will be written soon.

In the meantime, is there any risk that an instance running on one of our self-hosted zeebe clusters might get blacklisted? It seems like we would have no way to know about that if it happened.

Zelldon · 2022-01-28T22:51:47Z

Normally you should see an error in the log. Furthermore there are metrics regarding blacklisted instances.

I agree it would be good if we have soon some documentation, but also be aware that there are plans to replace the blacklisting with a different concept.

rusterholz · 2022-01-28T22:53:07Z

Thanks!

Zelldon · 2022-12-30T14:16:23Z

It doesn't to occur again, I will close it for now.

pihme added the kind/bug Categorizes an issue or PR as a bug label Jan 21, 2022

pihme added Impact: Data severity/high Marks a bug as having a noticeable impact on the user with no known workaround labels Jan 21, 2022

pihme added severity/mid Marks a bug as having a noticeable impact but with a known workaround and removed severity/high Marks a bug as having a noticeable impact on the user with no known workaround labels Jan 21, 2022

npepinpe added the Impact: Usability label Jan 25, 2022

npepinpe added area/ux Marks an issue as related to improving the user experience and removed area/observability Marks an issue as observability related labels Apr 11, 2022

pihme added the team/process-automation label May 30, 2022

menski removed the team/process-automation label Jul 11, 2022

Zelldon closed this as completed Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

pihme commented Jan 21, 2022 •

edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022 •

edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022 •

edited

pihme commented Jan 21, 2022 •

edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022

pihme commented Jan 21, 2022

npepinpe commented Jan 25, 2022 •

edited

rusterholz commented Jan 28, 2022 •

edited

Zelldon commented Jan 28, 2022

rusterholz commented Jan 28, 2022

Zelldon commented Jan 28, 2022

rusterholz commented Jan 28, 2022

Zelldon commented Dec 30, 2022

IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

Comments

pihme commented Jan 21, 2022 • edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022 • edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022 • edited

pihme commented Jan 21, 2022 • edited

pihme commented Jan 21, 2022

pihme commented Jan 21, 2022

korthout commented Jan 21, 2022

pihme commented Jan 21, 2022

npepinpe commented Jan 25, 2022 • edited

rusterholz commented Jan 28, 2022 • edited

Zelldon commented Jan 28, 2022

rusterholz commented Jan 28, 2022

Zelldon commented Jan 28, 2022

rusterholz commented Jan 28, 2022

Zelldon commented Dec 30, 2022

pihme commented Jan 21, 2022 •

edited

korthout commented Jan 21, 2022 •

edited

korthout commented Jan 21, 2022 •

edited

pihme commented Jan 21, 2022 •

edited

npepinpe commented Jan 25, 2022 •

edited

rusterholz commented Jan 28, 2022 •

edited