Job of cancelled instance can be activated if an error was thrown on it #8588

rusterholz · 2022-01-13T23:43:55Z

Describe the Bug
A job can still be activated (that is, Zeebe will return it as part of an ActivateJobsResponse) even after its associated process instance has been cancelled, if an error was previously thrown on that job.

To Reproduce
@korthout's method is easier than mine, especially for Camunda folks:

I was also able to reproduce it through zbctl:

zbctl deploy ./one_task.bpmn
sleep 1
zbctl create instance one_task
zbctl activate jobs do_something
zbctl throwError job 2251799813685256 --errorCode FOO --errorMessage foobar
sleep 1
zbctl cancel instance 2251799813685251
sleep 1
zbctl activate jobs do_something

This activated the job and returned:

{
  "jobs":  [
    {
      "key":  "2251799813685256",
      "type":  "do_something",
      "processInstanceKey":  "2251799813685251",
      "bpmnProcessId":  "one_task",
      "processDefinitionVersion":  1,
      "processDefinitionKey":  "2251799813685249",
      "elementId":  "do_something",
      "elementInstanceKey":  "2251799813685255",
      "customHeaders":  "{\"what_to_do\":\"nothing\"}",
      "worker":  "zbctl",
      "retries":  2,
      "deadline":  "1642166365427",
      "variables":  "{}"
    }
  ]
}

My original method

ZIP file: reproduce_bug.zip

Ensure you have ruby installed, as well as the zeebe_client ruby gem.
Download the attached ZIP file and expand it to reveal the reproduce_bug.rb and one_task.bpmn files, and put them in a directory together.
Ensure you have Zeebe running. If you need to change Zeebe's address away from localhost:26500, edit the reproduce_bug.rb file.
Run ruby reproduce_bug.rb and observe the output.

The script will deploy the included BPMN, create an instance of that process, and activate jobs for that process instance. If the jobs that Zeebe returns are for a running process instance, it will throw an error on them. It will then cancel that process instance and loop around and create another instance. If the jobs that it receives from Zeebe are from a process instance that was already successfully cancelled, it will write a line of output including "[!!! BUG !!!]". The script will exit after 5 occurrences of the bug or 50 loops, whichever happens first.

Expected Behavior
After Zeebe successfully cancels a process instance, no jobs from that instance should ever be returned to a future ActivateJobs request.

Actual Behavior
Jobs which previously had an error thrown on them before their process instance was cancelled are being returned to ActivateJobs requests that are placed after their process instance was cancelled.

Log/Stacktrace

My original method

Here is the output of a sample run of the reproduce_bug.rb script:

$ ruby reproduce_bug.rb
deploying process definition from /usr/src/app/one_task.bpmn
deployed process `one_task` with key 2251799813685249
created instance 2251799813685423
activated job 2251799813685429 for process instance 2251799813685423
threw error on job 2251799813685429
cancelled instance 2251799813685423
created instance 2251799813685432
[!!! BUG !!!] activated job 2251799813685429 for cancelled process instance 2251799813685423 [!!! BUG !!!]
cancelled instance 2251799813685432
created instance 2251799813685439
activated job 2251799813685445 for process instance 2251799813685439
threw error on job 2251799813685445
cancelled instance 2251799813685439
created instance 2251799813685448
[!!! BUG !!!] activated job 2251799813685445 for cancelled process instance 2251799813685439 [!!! BUG !!!]
cancelled instance 2251799813685448
created instance 2251799813685455
activated job 2251799813685461 for process instance 2251799813685455
threw error on job 2251799813685461
cancelled instance 2251799813685455
created instance 2251799813685464
[!!! BUG !!!] activated job 2251799813685461 for cancelled process instance 2251799813685455 [!!! BUG !!!]
cancelled instance 2251799813685464
created instance 2251799813685471
activated job 2251799813685477 for process instance 2251799813685471
threw error on job 2251799813685477
cancelled instance 2251799813685471
created instance 2251799813685480
[!!! BUG !!!] activated job 2251799813685429 for cancelled process instance 2251799813685423 [!!! BUG !!!]
[!!! BUG !!!] activated job 2251799813685445 for cancelled process instance 2251799813685439 [!!! BUG !!!]
[!!! BUG !!!] activated job 2251799813685477 for cancelled process instance 2251799813685471 [!!! BUG !!!]
cancelled instance 2251799813685480

Environment:

OS: Debian GNU/Linux 10 in a Docker container
Zeebe Version: 1.2.9 and 1.3.0 (verified against both)
Configuration: zeebe-kafka-exporter v3.0.0

The text was updated successfully, but these errors were encountered:

rusterholz · 2022-01-13T23:49:39Z

For those who may be curious about the test script, but don't want to download it:

original ruby test script

require "zeebe/client"

ZEEBE_ADDRESS = "localhost:26500"
CREDENTIALS = :this_channel_is_insecure

TEST_BPMN_PATH = File.join(__dir__, "one_task.bpmn")
TEST_BPMN_PROCESS_ID = "one_task"
TEST_TASK_TYPE = "do_something"

JOB_TIMEOUT_MS = 1000
REQUEST_TIMEOUT_MS = 10000

LOOPS = 50
BUG_THRESHOLD = 5

include ::Zeebe::Client::GatewayProtocol

gateway = Gateway::Stub.new(ZEEBE_ADDRESS, CREDENTIALS)

# deploy our test process:

puts "deploying process definition from #{TEST_BPMN_PATH}"
deploy_response = gateway.deploy_process(DeployProcessRequest.new(
  processes: [
    ProcessRequestObject.new(
      name: File.basename(TEST_BPMN_PATH),
      definition: File.read(TEST_BPMN_PATH)
    )
  ]
))

process_definition_key =
  deploy_response.processes.find { |meta| meta["bpmnProcessId"] == TEST_BPMN_PROCESS_ID }["processDefinitionKey"]

puts "deployed process `#{TEST_BPMN_PROCESS_ID}` with key #{process_definition_key}"

# to reproduce the bug, we create some process instances, activate their jobs, throw errors on
# those jobs, then cancel the instances and start over:

instances_created = []
instances_cancelled = []
bug_count = 0

LOOPS.times do
  break if bug_count >= BUG_THRESHOLD # break the loop early if the bug has occurred 5 or more times

  # create a process instance:
  create_instance_response = gateway.create_process_instance(
    CreateProcessInstanceRequest.new(bpmnProcessId: TEST_BPMN_PROCESS_ID, version: -1)
  )
  instance_key = create_instance_response["processInstanceKey"]
  puts "created instance #{instance_key}"
  instances_created << instance_key

  # activate some jobs, from one or more process instances:
  activate_jobs_stream = gateway.activate_jobs(
    ActivateJobsRequest.new(
      type: TEST_TASK_TYPE,
      worker: "#{__FILE__}-#{Process.pid}",
      maxJobsToActivate: LOOPS,
      timeout: JOB_TIMEOUT_MS,
      requestTimeout: REQUEST_TIMEOUT_MS,
    )
  )
  activate_jobs_stream.each do |activated_job_response|
    activated_job_response.jobs.each do |activated_job|
      job_key = activated_job["key"]
      job_instance_key = activated_job["processInstanceKey"]

      # check the process instance key of each job to see whether it belongs to
      # a process instance that we know was previously cancelled:
      if instances_cancelled.include?(job_instance_key)
        puts "[!!! BUG !!!] activated job #{job_key} for cancelled process instance #{job_instance_key} [!!! BUG !!!]"
        bug_count += 1
      elsif instances_created.include?(job_instance_key)
        puts "activated job #{job_key} for process instance #{job_instance_key}"

        # throw an error on this job and make sure we get a successful response back from Zeebe:
        throw_error_response = gateway.throw_error(
          ThrowErrorRequest.new(
            jobKey: job_key,
            errorCode: "FOO",
            errorMessage: "foobar"
          )
        )
        if throw_error_response.is_a?(ThrowErrorResponse)
          puts "threw error on job #{job_key}"
        else
          "this should be impossible, the gateway should raise a GRPC error if it fails"
        end
      else
        puts "[???] activated job #{job_key} for unknown process instance #{job_instance_key} [???]"
      end
    end
  end

  # once the activate jobs call completes, cancel the current process instance, and
  # make sure we get a successful response back from Zeebe before recording that we
  # cancelled this instance:
  cancel_instance_response = gateway.cancel_process_instance(
    CancelProcessInstanceRequest.new(processInstanceKey: instance_key)
  )
  if cancel_instance_response.is_a?(CancelProcessInstanceResponse)
    puts "cancelled instance #{instance_key}"
    instances_cancelled << instance_key
  else
    raise "this should be impossible, the gateway should raise a GRPC error if it fails"
  end

  sleep 0.5
end

korthout · 2022-01-14T13:52:11Z

I was able to reproduce it locally, even just running the code once (without a loop).

I was also able to reproduce it through zbctl:

zbctl deploy ./one_task.bpmn
sleep 1
zbctl create instance one_task
zbctl activate jobs do_something
zbctl throwError job 2251799813685256 --errorCode FOO --errorMessage foobar
sleep 1
zbctl cancel instance 2251799813685251
sleep 1
zbctl activate jobs do_something

This activated the job and returned:

{
  "jobs":  [
    {
      "key":  "2251799813685256",
      "type":  "do_something",
      "processInstanceKey":  "2251799813685251",
      "bpmnProcessId":  "one_task",
      "processDefinitionVersion":  1,
      "processDefinitionKey":  "2251799813685249",
      "elementId":  "do_something",
      "elementInstanceKey":  "2251799813685255",
      "customHeaders":  "{\"what_to_do\":\"nothing\"}",
      "worker":  "zbctl",
      "retries":  2,
      "deadline":  "1642166365427",
      "variables":  "{}"
    }
  ]
}

I then checked the state using zdb: zdb instance entity 2251799813685251 -p ./raft-partition/partitions/1/runtime, which returned null indicating that the process is gone.

Then I looked at the logs, again using zdb. This showed that during the termination of the service task, the unhandled error incident is resolved. Which made me think about that logic, because I remember that the job key is important for the incident resolution logic.

What happens is that the job is made activatable again when the incident is resolved. However, we should expect a Cancel Job command while terminating that service task, but this is missing. We only delete the job from the log after a Job Cancelled (or Completed) event is written.

Log

Produced using zdb: zdb log print -p ./raft-partition/partitions/1/ | jq

{
  "records": [
    {
      "index": 1,
      "term": 1
    },
    {
      "index": 2,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [],
            "resources": [
              {
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg=="
              }
            ]
          },
          "valueType": "DEPLOYMENT",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 1,
          "intent": "CREATE",
          "timestamp": 1642164652033,
          "key": -1
        }
      ]
    },
    {
      "index": 3,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "bpmnProcessId": "one_task",
            "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
            "checksum": "iMz6A2A60L+DOeOb8kmyyA==",
            "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg==",
            "processDefinitionKey": 2251799813685249,
            "version": 1
          },
          "valueType": "PROCESS",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 2,
          "intent": "CREATED",
          "timestamp": 1642164652047,
          "key": 2251799813685249
        },
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [
              {
                "bpmnProcessId": "one_task",
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "checksum": "iMz6A2A60L+DOeOb8kmyyA==",
                "processDefinitionKey": 2251799813685249,
                "version": 1
              }
            ],
            "resources": [
              {
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg=="
              }
            ]
          },
          "valueType": "DEPLOYMENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 3,
          "intent": "CREATED",
          "timestamp": 1642164652047,
          "key": 2251799813685250
        },
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [],
            "resources": []
          },
          "valueType": "DEPLOYMENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 4,
          "intent": "FULLY_DISTRIBUTED",
          "timestamp": 1642164652047,
          "key": 2251799813685250
        }
      ]
    },
    {
      "index": 4,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 0,
            "bpmnProcessId": "one_task",
            "variables": {},
            "processInstanceKey": -1,
            "version": -1
          },
          "valueType": "PROCESS_INSTANCE_CREATION",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 5,
          "intent": "CREATE",
          "timestamp": 1642164681157,
          "key": -1
        }
      ]
    },
    {
      "index": 5,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 5,
          "position": 6,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681159,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "version": 1
          },
          "valueType": "PROCESS_INSTANCE_CREATION",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 5,
          "position": 7,
          "intent": "CREATED",
          "timestamp": 1642164681159,
          "key": 2251799813685252
        }
      ]
    },
    {
      "index": 6,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 8,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681161,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 9,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681161,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 10,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681161,
          "key": -1
        }
      ]
    },
    {
      "index": 7,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 11,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 12,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 13,
          "intent": "COMPLETE_ELEMENT",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        }
      ]
    },
    {
      "index": 8,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 14,
          "intent": "ELEMENT_COMPLETING",
          "timestamp": 1642164681179,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 15,
          "intent": "ELEMENT_COMPLETED",
          "timestamp": 1642164681179,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "FlowIn",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SEQUENCE_FLOW",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 16,
          "intent": "SEQUENCE_FLOW_TAKEN",
          "timestamp": 1642164681179,
          "key": 2251799813685254
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 17,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681179,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 9,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 18,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681187,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {
              "what_to_do": "nothing"
            },
            "deadline": -1,
            "errorMessage": "",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "errorCode": "",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "worker": "",
            "retries": 2,
            "processDefinitionVersion": 1,
            "type": "do_something"
          },
          "valueType": "JOB",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 19,
          "intent": "CREATED",
          "timestamp": 1642164681187,
          "key": 2251799813685256
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 20,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681187,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 10,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [],
            "jobs": [],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 21,
          "intent": "ACTIVATE",
          "timestamp": 1642164701282,
          "key": -1
        }
      ]
    },
    {
      "index": 11,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [
              2251799813685256
            ],
            "jobs": [
              {
                "customHeaders": {
                  "what_to_do": "nothing"
                },
                "deadline": 1642165001282,
                "errorMessage": "",
                "elementInstanceKey": 2251799813685255,
                "processDefinitionKey": 2251799813685249,
                "bpmnProcessId": "one_task",
                "elementId": "do_something",
                "errorCode": "",
                "variables": {},
                "processInstanceKey": 2251799813685251,
                "worker": "zbctl",
                "retries": 2,
                "processDefinitionVersion": 1,
                "type": "do_something"
              }
            ],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 21,
          "position": 22,
          "intent": "ACTIVATED",
          "timestamp": 1642164701284,
          "key": 2251799813685257
        }
      ]
    },
    {
      "index": 12,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {},
            "deadline": -1,
            "errorMessage": "foobar",
            "elementInstanceKey": -1,
            "processDefinitionKey": -1,
            "bpmnProcessId": "",
            "elementId": "",
            "errorCode": "FOO",
            "variables": {},
            "processInstanceKey": -1,
            "worker": "",
            "retries": -1,
            "processDefinitionVersion": -1,
            "type": ""
          },
          "valueType": "JOB",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 23,
          "intent": "THROW_ERROR",
          "timestamp": 1642164800313,
          "key": 2251799813685256
        }
      ]
    },
    {
      "index": 13,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {
              "what_to_do": "nothing"
            },
            "deadline": 1642165001282,
            "errorMessage": "foobar",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "errorCode": "FOO",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "worker": "zbctl",
            "retries": 2,
            "processDefinitionVersion": 1,
            "type": "do_something"
          },
          "valueType": "JOB",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 23,
          "position": 24,
          "intent": "ERROR_THROWN",
          "timestamp": 1642164800315,
          "key": 2251799813685256
        },
        {
          "partitionId": 1,
          "value": {
            "errorMessage": "Expected to throw an error event with the code 'FOO' with message 'foobar', but it was not caught. No error events are available in the scope.",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "processInstanceKey": 2251799813685251,
            "errorType": "UNHANDLED_ERROR_EVENT",
            "variableScopeKey": 2251799813685255,
            "jobKey": 2251799813685256
          },
          "valueType": "INCIDENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 23,
          "position": 25,
          "intent": "CREATED",
          "timestamp": 1642164800315,
          "key": 2251799813685258
        }
      ]
    },
    {
      "index": 14,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": -1,
            "bpmnProcessId": "",
            "elementId": "",
            "processInstanceKey": -1,
            "bpmnElementType": "UNSPECIFIED",
            "flowScopeKey": -1,
            "version": -1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 26,
          "intent": "CANCEL",
          "timestamp": 1642164832099,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 15,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 26,
          "position": 27,
          "intent": "TERMINATE_ELEMENT",
          "timestamp": 1642164832101,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 16,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 27,
          "position": 28,
          "intent": "ELEMENT_TERMINATING",
          "timestamp": 1642164832104,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 27,
          "position": 29,
          "intent": "TERMINATE_ELEMENT",
          "timestamp": 1642164832104,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 17,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 30,
          "intent": "ELEMENT_TERMINATING",
          "timestamp": 1642164832111,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "errorMessage": "Expected to throw an error event with the code 'FOO' with message 'foobar', but it was not caught. No error events are available in the scope.",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "processInstanceKey": 2251799813685251,
            "errorType": "UNHANDLED_ERROR_EVENT",
            "variableScopeKey": 2251799813685255,
            "jobKey": 2251799813685256
          },
          "valueType": "INCIDENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 31,
          "intent": "RESOLVED",
          "timestamp": 1642164832111,
          "key": 2251799813685258
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 32,
          "intent": "ELEMENT_TERMINATED",
          "timestamp": 1642164832111,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 33,
          "intent": "ELEMENT_TERMINATED",
          "timestamp": 1642164832111,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 18,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [],
            "jobs": [],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 34,
          "intent": "ACTIVATE",
          "timestamp": 1642164844508,
          "key": -1
        }
      ]
    },
    {
      "index": 19,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [
              2251799813685256
            ],
            "jobs": [
              {
                "customHeaders": {
                  "what_to_do": "nothing"
                },
                "deadline": 1642165144508,
                "errorMessage": "foobar",
                "elementInstanceKey": 2251799813685255,
                "processDefinitionKey": 2251799813685249,
                "bpmnProcessId": "one_task",
                "elementId": "do_something",
                "errorCode": "FOO",
                "variables": {},
                "processInstanceKey": 2251799813685251,
                "worker": "zbctl",
                "retries": 2,
                "processDefinitionVersion": 1,
                "type": "do_something"
              }
            ],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 34,
          "position": 35,
          "intent": "ACTIVATED",
          "timestamp": 1642164844510,
          "key": 2251799813685259
        }
      ]
    }
  ]
}

Looking at this, I think this is pretty old bug, that became worse since 1.1.0. The Job cancel command should always be written during the termination of a service task, but the special incident logic for jobs has already removed the reference to the job from the service task and so it doesn't have any job to cancel. In the past that just meant that the job stayed in the database even though the instance was gone, but since #6000 this job is made activatable again.

Fixing the problem
First, we'll need to make sure to always cancel the job when a service task terminates.

Second, we'll also need to consider whether the incident should really be resolved when terminating the service task. Perhaps the incident is merely IGNORED (or something similar). By introducing a new incident intent we can make this difference clear and apply the state differently.

Third, this makes me think about how much logic we have in event appliers. Why does the incident resolved event lead to the job's state being changed? Wouldn't it be better to write a Job Suspended when the throw error is being processed? We shouldn't make assumption about what to do in applying the events, instead the events that are written should tell us what happened. Of course, this part is not necessary to resolve the issue, but I think it might improve things.

korthout · 2022-01-14T14:01:28Z

@npepinpe Let's triage this

This happens for jobs where an error has been thrown on it that was not caught in the process and then led to an incident. If the process is cancelled instead of resolving the incident, then the job is made activatable again. This means a worker can pick it up and work on it while there is no process instance anymore.

The situation is quite specific, and is more likely to occur during development than in production due to the unhandled error, but it might happen. I'm quite certain the regression is introduced with 1.1.0, because of #6000.

npepinpe · 2022-01-14T15:01:59Z

Well, considering you've already root caused this... 😄

To be clear, what happens when it's activated again and then completed/failed? Not much, I guess? So it's mostly taking up resources for no reason, correct?

korthout · 2022-01-14T15:40:07Z

what happens when it's activated again and then completed/failed? Not much, I guess?

I'd need to check what happens internally, but there's not much the engine can do with it. However, the worker would have executed its work for that job and from the perspective of orchestration this is pretty bad.

npepinpe · 2022-01-14T15:45:04Z

What's your estimate on the severity? I would class it as high, and have us work on this soon. You're correct that the impact is quite bad if applications perform work that was cancelled. Of course the impact depends on the application's use case, but it's critical that Zeebe remains trusted as an orchestration engine.

npepinpe · 2022-01-14T15:46:16Z

As you already looked into it, can you work on it? Do you think it's something we will be able to get into 1.3.1? I know you're busy next week...

korthout · 2022-01-14T16:20:08Z

I should be able to find some time for it in week 4, but the solution ideas I describe above are still pretty vague. So I'm not sure it will be done in time for 1.3.1.

korthout · 2022-01-14T16:22:13Z

What's your estimate on the severity?

I might downgrade it to Mid, because it requires throwing an uncaught error in the process, which is less likely to happen in production systems than in development systems. Still the impact can be big if it does happen as you say: it's critical that Zeebe remains trusted as an orchestration engine

npepinpe · 2022-01-14T16:24:15Z

So we assume everyone will test their processes thoroughly before deploying to production? 😉 I get your point, but as the tooling for testing is very fresh, I don't know how likely it is, and when it does happen it's still pretty bad. I would keep it as high for now, especially since it's not clear how quickly users would realize this is happening in a production system to fix it (and Zeebe wouldn't automatically correct it).

It's fine if we miss 1.3.1, it was more hopeful than anything 🙂

rusterholz · 2022-01-18T20:06:58Z

I updated the description with @korthout's vastly-simpler method for reproducing the issue. My original method is tucked away in a collapsible section there if anyone is looking for it.

jerry123888 · 2022-03-23T06:14:19Z

So we assume everyone will test their processes thoroughly before deploying to production? 😉 I get your point, but as the tooling for testing is very fresh, I don't know how likely it is, and when it does happen it's still pretty bad. I would keep it as high for now, especially since it's not clear how quickly users would realize this is happening in a production system to fix it (and Zeebe wouldn't automatically correct it).

It's fine if we miss 1.3.1, it was more hopeful than anything 🙂

At present this problem has a great impact on us and has received many user complaints.
Do you have any plans to start? Is it convenient to tell me the detail start time ？Looking forward to your problem solving

korthout · 2022-03-23T06:57:44Z

Hi @jerry123888 thanks for raising your concern. I've had to switch focus from this issue and it wasn't on my mind anymore. I'll assign myself to keep it at the top of my tasks.

This will not make the upcoming release, but I expect that we can find some time to fix this in an upcoming patch release.

This adds a regression test against activating jobs of cancelled process instances, by making sure they are cancelled (i.e. deleted). In this specific case, an uncaught error was thrown on a job which led to an UNHANDLED_ERROR_EVENT incident. When the process instance is cancelled, the job was not called, but instead was made activatable again. This happened, because the incident was resolved as part of the process instance termination logic. Instead, we should guarantee that the job is cancelled. A cancelled job can no longer be made re-activatable. This is a terminal state for jobs, because a cancelled job is a deleted job. See #8588 for more details.

9219: Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description This fixes a bug where canceling a process instance could lead to a job becoming activatable even though the process instance is terminated. See #8588 for more details. The bug is fixed by making sure the job is immediately canceled (i.e. deleted from the state) when we terminate the related element instance. We achieve this by writing a Job Canceled event when the TERMINATE_ELEMENT command is being processed. This PR fixes the bug in a way that can be backported because it doesn't introduce too many changes. Alternatively, the bug could be fixed by tackling the main issue: applying the Incident Resolved event should not affect the state of a job. Instead, the job should be made activatable by writing a separate Job event in the Resolve Incident Processor. That would be a larger change, with more impact on the event appliers than what this PR proposes. Changes to event appliers should generally be avoided. ## Related issues  closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

This adds a regression test against activating jobs of cancelled process instances, by making sure they are cancelled (i.e. deleted). In this specific case, an uncaught error was thrown on a job which led to an UNHANDLED_ERROR_EVENT incident. When the process instance is cancelled, the job was not called, but instead was made activatable again. This happened, because the incident was resolved as part of the process instance termination logic. Instead, we should guarantee that the job is cancelled. A cancelled job can no longer be made re-activatable. This is a terminal state for jobs, because a cancelled job is a deleted job. See #8588 for more details. (cherry picked from commit d43c344)

9250: [Backport 1.3] Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description  Backport of #9219 ## Related issues  closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout ## Description  Backport of #9219 ## Related issues  closes #8588 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

korthout · 2022-04-29T14:52:22Z

@rusterholz The issue is fixed. The fix will be available in the upcoming patch releases 8.0.2 and 1.3.8.

rusterholz added the kind/bug Categorizes an issue or PR as a bug label Jan 13, 2022

korthout changed the title ~~Jobs can still be activated after their process instances have been cancelled~~ Job of cancelled instance can be activated if an error was thrown on it Jan 14, 2022

korthout added scope/broker Marks an issue or PR to appear in the broker section of the changelog Impact: Regression labels Jan 14, 2022

npepinpe added this to Ready in Zeebe Jan 14, 2022

npepinpe added the severity/high Marks a bug as having a noticeable impact on the user with no known workaround label Jan 14, 2022

korthout mentioned this issue Jan 21, 2022

IllegalStateException: Expected to find a flow element with id 'migrated-variable-2251799826929599' in process with key '2251799825939186' but not found. #8626

Closed

korthout self-assigned this Mar 23, 2022

KerstinHebel removed this from Ready in Zeebe Mar 23, 2022

npepinpe added the team/engine label Mar 29, 2022

npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Regression labels Apr 11, 2022

korthout mentioned this issue Apr 22, 2022

Cancel job with incident when canceling the process instance #9219

Merged

10 tasks

zeebe-bors-camunda bot closed this as completed in #9219 Apr 28, 2022

korthout mentioned this issue Apr 28, 2022

[Backport 1.3] Cancel job with incident when canceling the process instance #9250

Merged

korthout mentioned this issue Apr 28, 2022

[Backport 8.0] Cancel job with incident when canceling the process instance #9251

Merged

npepinpe added release/8.0.2 version:1.3.8 labels May 3, 2022

deepthidevaki added the version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 label May 3, 2022

Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job of cancelled instance can be activated if an error was thrown on it #8588

Job of cancelled instance can be activated if an error was thrown on it #8588

rusterholz commented Jan 13, 2022 •

edited

rusterholz commented Jan 13, 2022 •

edited

korthout commented Jan 14, 2022

korthout commented Jan 14, 2022 •

edited

npepinpe commented Jan 14, 2022

korthout commented Jan 14, 2022

npepinpe commented Jan 14, 2022

npepinpe commented Jan 14, 2022

korthout commented Jan 14, 2022

korthout commented Jan 14, 2022 •

edited

npepinpe commented Jan 14, 2022 •

edited

rusterholz commented Jan 18, 2022

jerry123888 commented Mar 23, 2022

korthout commented Mar 23, 2022

korthout commented Apr 29, 2022

Job of cancelled instance can be activated if an error was thrown on it #8588

Job of cancelled instance can be activated if an error was thrown on it #8588

Comments

rusterholz commented Jan 13, 2022 • edited

rusterholz commented Jan 13, 2022 • edited

korthout commented Jan 14, 2022

korthout commented Jan 14, 2022 • edited

npepinpe commented Jan 14, 2022

korthout commented Jan 14, 2022

npepinpe commented Jan 14, 2022

npepinpe commented Jan 14, 2022

korthout commented Jan 14, 2022

korthout commented Jan 14, 2022 • edited

npepinpe commented Jan 14, 2022 • edited

rusterholz commented Jan 18, 2022

jerry123888 commented Mar 23, 2022

korthout commented Mar 23, 2022

korthout commented Apr 29, 2022

rusterholz commented Jan 13, 2022 •

edited

rusterholz commented Jan 13, 2022 •

edited

korthout commented Jan 14, 2022 •

edited

korthout commented Jan 14, 2022 •

edited

npepinpe commented Jan 14, 2022 •

edited