Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job of cancelled instance can be activated if an error was thrown on it #8588

Closed
rusterholz opened this issue Jan 13, 2022 · 14 comments · Fixed by #9219
Closed

Job of cancelled instance can be activated if an error was thrown on it #8588

rusterholz opened this issue Jan 13, 2022 · 14 comments · Fixed by #9219
Assignees
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:1.3.8 version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@rusterholz
Copy link

rusterholz commented Jan 13, 2022

Describe the Bug
A job can still be activated (that is, Zeebe will return it as part of an ActivateJobsResponse) even after its associated process instance has been cancelled, if an error was previously thrown on that job.

To Reproduce
@korthout's method is easier than mine, especially for Camunda folks:

I was also able to reproduce it through zbctl:

zbctl deploy ./one_task.bpmn
sleep 1
zbctl create instance one_task
zbctl activate jobs do_something
zbctl throwError job 2251799813685256 --errorCode FOO --errorMessage foobar
sleep 1
zbctl cancel instance 2251799813685251
sleep 1
zbctl activate jobs do_something

This activated the job and returned:

{
  "jobs":  [
    {
      "key":  "2251799813685256",
      "type":  "do_something",
      "processInstanceKey":  "2251799813685251",
      "bpmnProcessId":  "one_task",
      "processDefinitionVersion":  1,
      "processDefinitionKey":  "2251799813685249",
      "elementId":  "do_something",
      "elementInstanceKey":  "2251799813685255",
      "customHeaders":  "{\"what_to_do\":\"nothing\"}",
      "worker":  "zbctl",
      "retries":  2,
      "deadline":  "1642166365427",
      "variables":  "{}"
    }
  ]
}
My original method

ZIP file: reproduce_bug.zip

  1. Ensure you have ruby installed, as well as the zeebe_client ruby gem.
  2. Download the attached ZIP file and expand it to reveal the reproduce_bug.rb and one_task.bpmn files, and put them in a directory together.
  3. Ensure you have Zeebe running. If you need to change Zeebe's address away from localhost:26500, edit the reproduce_bug.rb file.
  4. Run ruby reproduce_bug.rb and observe the output.

The script will deploy the included BPMN, create an instance of that process, and activate jobs for that process instance. If the jobs that Zeebe returns are for a running process instance, it will throw an error on them. It will then cancel that process instance and loop around and create another instance. If the jobs that it receives from Zeebe are from a process instance that was already successfully cancelled, it will write a line of output including "[!!! BUG !!!]". The script will exit after 5 occurrences of the bug or 50 loops, whichever happens first.

Expected Behavior
After Zeebe successfully cancels a process instance, no jobs from that instance should ever be returned to a future ActivateJobs request.

Actual Behavior
Jobs which previously had an error thrown on them before their process instance was cancelled are being returned to ActivateJobs requests that are placed after their process instance was cancelled.

Log/Stacktrace

My original method

Here is the output of a sample run of the reproduce_bug.rb script:

$ ruby reproduce_bug.rb
deploying process definition from /usr/src/app/one_task.bpmn
deployed process `one_task` with key 2251799813685249
created instance 2251799813685423
activated job 2251799813685429 for process instance 2251799813685423
threw error on job 2251799813685429
cancelled instance 2251799813685423
created instance 2251799813685432
[!!! BUG !!!] activated job 2251799813685429 for cancelled process instance 2251799813685423 [!!! BUG !!!]
cancelled instance 2251799813685432
created instance 2251799813685439
activated job 2251799813685445 for process instance 2251799813685439
threw error on job 2251799813685445
cancelled instance 2251799813685439
created instance 2251799813685448
[!!! BUG !!!] activated job 2251799813685445 for cancelled process instance 2251799813685439 [!!! BUG !!!]
cancelled instance 2251799813685448
created instance 2251799813685455
activated job 2251799813685461 for process instance 2251799813685455
threw error on job 2251799813685461
cancelled instance 2251799813685455
created instance 2251799813685464
[!!! BUG !!!] activated job 2251799813685461 for cancelled process instance 2251799813685455 [!!! BUG !!!]
cancelled instance 2251799813685464
created instance 2251799813685471
activated job 2251799813685477 for process instance 2251799813685471
threw error on job 2251799813685477
cancelled instance 2251799813685471
created instance 2251799813685480
[!!! BUG !!!] activated job 2251799813685429 for cancelled process instance 2251799813685423 [!!! BUG !!!]
[!!! BUG !!!] activated job 2251799813685445 for cancelled process instance 2251799813685439 [!!! BUG !!!]
[!!! BUG !!!] activated job 2251799813685477 for cancelled process instance 2251799813685471 [!!! BUG !!!]
cancelled instance 2251799813685480

Environment:

  • OS: Debian GNU/Linux 10 in a Docker container
  • Zeebe Version: 1.2.9 and 1.3.0 (verified against both)
  • Configuration: zeebe-kafka-exporter v3.0.0
@rusterholz rusterholz added the kind/bug Categorizes an issue or PR as a bug label Jan 13, 2022
@rusterholz
Copy link
Author

rusterholz commented Jan 13, 2022

For those who may be curious about the test script, but don't want to download it:

original ruby test script
require "zeebe/client"

ZEEBE_ADDRESS = "localhost:26500"
CREDENTIALS = :this_channel_is_insecure

TEST_BPMN_PATH = File.join(__dir__, "one_task.bpmn")
TEST_BPMN_PROCESS_ID = "one_task"
TEST_TASK_TYPE = "do_something"

JOB_TIMEOUT_MS = 1000
REQUEST_TIMEOUT_MS = 10000

LOOPS = 50
BUG_THRESHOLD = 5

include ::Zeebe::Client::GatewayProtocol

gateway = Gateway::Stub.new(ZEEBE_ADDRESS, CREDENTIALS)

# deploy our test process:

puts "deploying process definition from #{TEST_BPMN_PATH}"
deploy_response = gateway.deploy_process(DeployProcessRequest.new(
  processes: [
    ProcessRequestObject.new(
      name: File.basename(TEST_BPMN_PATH),
      definition: File.read(TEST_BPMN_PATH)
    )
  ]
))

process_definition_key =
  deploy_response.processes.find { |meta| meta["bpmnProcessId"] == TEST_BPMN_PROCESS_ID }["processDefinitionKey"]

puts "deployed process `#{TEST_BPMN_PROCESS_ID}` with key #{process_definition_key}"

# to reproduce the bug, we create some process instances, activate their jobs, throw errors on
# those jobs, then cancel the instances and start over:

instances_created = []
instances_cancelled = []
bug_count = 0

LOOPS.times do
  break if bug_count >= BUG_THRESHOLD # break the loop early if the bug has occurred 5 or more times

  # create a process instance:
  create_instance_response = gateway.create_process_instance(
    CreateProcessInstanceRequest.new(bpmnProcessId: TEST_BPMN_PROCESS_ID, version: -1)
  )
  instance_key = create_instance_response["processInstanceKey"]
  puts "created instance #{instance_key}"
  instances_created << instance_key

  # activate some jobs, from one or more process instances:
  activate_jobs_stream = gateway.activate_jobs(
    ActivateJobsRequest.new(
      type: TEST_TASK_TYPE,
      worker: "#{__FILE__}-#{Process.pid}",
      maxJobsToActivate: LOOPS,
      timeout: JOB_TIMEOUT_MS,
      requestTimeout: REQUEST_TIMEOUT_MS,
    )
  )
  activate_jobs_stream.each do |activated_job_response|
    activated_job_response.jobs.each do |activated_job|
      job_key = activated_job["key"]
      job_instance_key = activated_job["processInstanceKey"]

      # check the process instance key of each job to see whether it belongs to
      # a process instance that we know was previously cancelled:
      if instances_cancelled.include?(job_instance_key)
        puts "[!!! BUG !!!] activated job #{job_key} for cancelled process instance #{job_instance_key} [!!! BUG !!!]"
        bug_count += 1
      elsif instances_created.include?(job_instance_key)
        puts "activated job #{job_key} for process instance #{job_instance_key}"

        # throw an error on this job and make sure we get a successful response back from Zeebe:
        throw_error_response = gateway.throw_error(
          ThrowErrorRequest.new(
            jobKey: job_key,
            errorCode: "FOO",
            errorMessage: "foobar"
          )
        )
        if throw_error_response.is_a?(ThrowErrorResponse)
          puts "threw error on job #{job_key}"
        else
          "this should be impossible, the gateway should raise a GRPC error if it fails"
        end
      else
        puts "[???] activated job #{job_key} for unknown process instance #{job_instance_key} [???]"
      end
    end
  end

  # once the activate jobs call completes, cancel the current process instance, and
  # make sure we get a successful response back from Zeebe before recording that we
  # cancelled this instance:
  cancel_instance_response = gateway.cancel_process_instance(
    CancelProcessInstanceRequest.new(processInstanceKey: instance_key)
  )
  if cancel_instance_response.is_a?(CancelProcessInstanceResponse)
    puts "cancelled instance #{instance_key}"
    instances_cancelled << instance_key
  else
    raise "this should be impossible, the gateway should raise a GRPC error if it fails"
  end

  sleep 0.5
end

@korthout
Copy link
Member

I was able to reproduce it locally, even just running the code once (without a loop).

I was also able to reproduce it through zbctl:

zbctl deploy ./one_task.bpmn
sleep 1
zbctl create instance one_task
zbctl activate jobs do_something
zbctl throwError job 2251799813685256 --errorCode FOO --errorMessage foobar
sleep 1
zbctl cancel instance 2251799813685251
sleep 1
zbctl activate jobs do_something

This activated the job and returned:

{
  "jobs":  [
    {
      "key":  "2251799813685256",
      "type":  "do_something",
      "processInstanceKey":  "2251799813685251",
      "bpmnProcessId":  "one_task",
      "processDefinitionVersion":  1,
      "processDefinitionKey":  "2251799813685249",
      "elementId":  "do_something",
      "elementInstanceKey":  "2251799813685255",
      "customHeaders":  "{\"what_to_do\":\"nothing\"}",
      "worker":  "zbctl",
      "retries":  2,
      "deadline":  "1642166365427",
      "variables":  "{}"
    }
  ]
}

I then checked the state using zdb: zdb instance entity 2251799813685251 -p ./raft-partition/partitions/1/runtime, which returned null indicating that the process is gone.

Then I looked at the logs, again using zdb. This showed that during the termination of the service task, the unhandled error incident is resolved. Which made me think about that logic, because I remember that the job key is important for the incident resolution logic.

What happens is that the job is made activatable again when the incident is resolved. However, we should expect a Cancel Job command while terminating that service task, but this is missing. We only delete the job from the log after a Job Cancelled (or Completed) event is written.

Log

Produced using zdb: zdb log print -p ./raft-partition/partitions/1/ | jq

{
  "records": [
    {
      "index": 1,
      "term": 1
    },
    {
      "index": 2,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [],
            "resources": [
              {
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg=="
              }
            ]
          },
          "valueType": "DEPLOYMENT",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 1,
          "intent": "CREATE",
          "timestamp": 1642164652033,
          "key": -1
        }
      ]
    },
    {
      "index": 3,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "bpmnProcessId": "one_task",
            "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
            "checksum": "iMz6A2A60L+DOeOb8kmyyA==",
            "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg==",
            "processDefinitionKey": 2251799813685249,
            "version": 1
          },
          "valueType": "PROCESS",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 2,
          "intent": "CREATED",
          "timestamp": 1642164652047,
          "key": 2251799813685249
        },
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [
              {
                "bpmnProcessId": "one_task",
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "checksum": "iMz6A2A60L+DOeOb8kmyyA==",
                "processDefinitionKey": 2251799813685249,
                "version": 1
              }
            ],
            "resources": [
              {
                "resourceName": "/Users/korthout/Downloads/reproduce_bug/one_task.bpmn",
                "resource": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiBpZD0iRGVmaW5pdGlvbnNfMXlhdTlpdCIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IlplZWJlIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iMC45LjEiPgogIDxicG1uOnByb2Nlc3MgaWQ9Im9uZV90YXNrIiBuYW1lPSJPbmUgVGFzayIgaXNFeGVjdXRhYmxlPSJ0cnVlIj4KICAgIDxicG1uOnN0YXJ0RXZlbnQgaWQ9IlN0YXJ0RXZlbnRfMSI+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dJbjwvYnBtbjpvdXRnb2luZz4KICAgIDwvYnBtbjpzdGFydEV2ZW50PgogICAgPGJwbW46c2VydmljZVRhc2sgaWQ9ImRvX3NvbWV0aGluZyIgbmFtZT0iRG8gU29tZXRoaW5nIj4KICAgICAgPGJwbW46ZXh0ZW5zaW9uRWxlbWVudHM+CiAgICAgICAgPHplZWJlOnRhc2tEZWZpbml0aW9uIHR5cGU9ImRvX3NvbWV0aGluZyIgcmV0cmllcz0iMiIgLz4KICAgICAgICA8emVlYmU6dGFza0hlYWRlcnM+CiAgICAgICAgICA8emVlYmU6aGVhZGVyIGtleT0id2hhdF90b19kbyIgdmFsdWU9Im5vdGhpbmciIC8+CiAgICAgICAgPC96ZWViZTp0YXNrSGVhZGVycz4KICAgICAgPC9icG1uOmV4dGVuc2lvbkVsZW1lbnRzPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93SW48L2JwbW46aW5jb21pbmc+CiAgICAgIDxicG1uOm91dGdvaW5nPkZsb3dPdXQ8L2JwbW46b3V0Z29pbmc+CiAgICA8L2JwbW46c2VydmljZVRhc2s+CiAgICA8YnBtbjpzZXF1ZW5jZUZsb3cgaWQ9IkZsb3dJbiIgc291cmNlUmVmPSJTdGFydEV2ZW50XzEiIHRhcmdldFJlZj0iZG9fc29tZXRoaW5nIiAvPgogICAgPGJwbW46ZW5kRXZlbnQgaWQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICA8YnBtbjppbmNvbWluZz5GbG93T3V0PC9icG1uOmluY29taW5nPgogICAgPC9icG1uOmVuZEV2ZW50PgogICAgPGJwbW46c2VxdWVuY2VGbG93IGlkPSJGbG93T3V0IiBzb3VyY2VSZWY9ImRvX3NvbWV0aGluZyIgdGFyZ2V0UmVmPSJFdmVudF8weG45eWhzIiAvPgogIDwvYnBtbjpwcm9jZXNzPgogIDxicG1uZGk6QlBNTkRpYWdyYW0gaWQ9IkJQTU5EaWFncmFtXzEiPgogICAgPGJwbW5kaTpCUE1OUGxhbmUgaWQ9IkJQTU5QbGFuZV8xIiBicG1uRWxlbWVudD0ib25lX3Rhc2siPgogICAgICA8YnBtbmRpOkJQTU5FZGdlIGlkPSJGbG93XzFxeW5tYXJfZGkiIGJwbW5FbGVtZW50PSJGbG93T3V0Ij4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMzkwIiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjQ1MiIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1ORWRnZSBpZD0iRmxvd18xN3g1dG1tX2RpIiBicG1uRWxlbWVudD0iRmxvd0luIj4KICAgICAgICA8ZGk6d2F5cG9pbnQgeD0iMjE1IiB5PSIxMjAiIC8+CiAgICAgICAgPGRpOndheXBvaW50IHg9IjI5MCIgeT0iMTIwIiAvPgogICAgICA8L2JwbW5kaTpCUE1ORWRnZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9Il9CUE1OU2hhcGVfU3RhcnRFdmVudF8yIiBicG1uRWxlbWVudD0iU3RhcnRFdmVudF8xIj4KICAgICAgICA8ZGM6Qm91bmRzIHg9IjE3OSIgeT0iMTAyIiB3aWR0aD0iMzYiIGhlaWdodD0iMzYiIC8+CiAgICAgIDwvYnBtbmRpOkJQTU5TaGFwZT4KICAgICAgPGJwbW5kaTpCUE1OU2hhcGUgaWQ9IkFjdGl2aXR5XzFicjNwMGZfZGkiIGJwbW5FbGVtZW50PSJkb19zb21ldGhpbmciPgogICAgICAgIDxkYzpCb3VuZHMgeD0iMjkwIiB5PSI4MCIgd2lkdGg9IjEwMCIgaGVpZ2h0PSI4MCIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgICA8YnBtbmRpOkJQTU5TaGFwZSBpZD0iRXZlbnRfMHc0aTJ2ZF9kaSIgYnBtbkVsZW1lbnQ9IkV2ZW50XzB4bjl5aHMiPgogICAgICAgIDxkYzpCb3VuZHMgeD0iNDUyIiB5PSIxMDIiIHdpZHRoPSIzNiIgaGVpZ2h0PSIzNiIgLz4KICAgICAgPC9icG1uZGk6QlBNTlNoYXBlPgogICAgPC9icG1uZGk6QlBNTlBsYW5lPgogIDwvYnBtbmRpOkJQTU5EaWFncmFtPgo8L2JwbW46ZGVmaW5pdGlvbnM+Cg=="
              }
            ]
          },
          "valueType": "DEPLOYMENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 3,
          "intent": "CREATED",
          "timestamp": 1642164652047,
          "key": 2251799813685250
        },
        {
          "partitionId": 1,
          "value": {
            "processesMetadata": [],
            "resources": []
          },
          "valueType": "DEPLOYMENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 1,
          "position": 4,
          "intent": "FULLY_DISTRIBUTED",
          "timestamp": 1642164652047,
          "key": 2251799813685250
        }
      ]
    },
    {
      "index": 4,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 0,
            "bpmnProcessId": "one_task",
            "variables": {},
            "processInstanceKey": -1,
            "version": -1
          },
          "valueType": "PROCESS_INSTANCE_CREATION",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 5,
          "intent": "CREATE",
          "timestamp": 1642164681157,
          "key": -1
        }
      ]
    },
    {
      "index": 5,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 5,
          "position": 6,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681159,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "version": 1
          },
          "valueType": "PROCESS_INSTANCE_CREATION",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 5,
          "position": 7,
          "intent": "CREATED",
          "timestamp": 1642164681159,
          "key": 2251799813685252
        }
      ]
    },
    {
      "index": 6,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 8,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681161,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 9,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681161,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 6,
          "position": 10,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681161,
          "key": -1
        }
      ]
    },
    {
      "index": 7,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 11,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 12,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 10,
          "position": 13,
          "intent": "COMPLETE_ELEMENT",
          "timestamp": 1642164681174,
          "key": 2251799813685253
        }
      ]
    },
    {
      "index": 8,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 14,
          "intent": "ELEMENT_COMPLETING",
          "timestamp": 1642164681179,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "StartEvent_1",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "START_EVENT",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 15,
          "intent": "ELEMENT_COMPLETED",
          "timestamp": 1642164681179,
          "key": 2251799813685253
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "FlowIn",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SEQUENCE_FLOW",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 16,
          "intent": "SEQUENCE_FLOW_TAKEN",
          "timestamp": 1642164681179,
          "key": 2251799813685254
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 13,
          "position": 17,
          "intent": "ACTIVATE_ELEMENT",
          "timestamp": 1642164681179,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 9,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 18,
          "intent": "ELEMENT_ACTIVATING",
          "timestamp": 1642164681187,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {
              "what_to_do": "nothing"
            },
            "deadline": -1,
            "errorMessage": "",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "errorCode": "",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "worker": "",
            "retries": 2,
            "processDefinitionVersion": 1,
            "type": "do_something"
          },
          "valueType": "JOB",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 19,
          "intent": "CREATED",
          "timestamp": 1642164681187,
          "key": 2251799813685256
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 17,
          "position": 20,
          "intent": "ELEMENT_ACTIVATED",
          "timestamp": 1642164681187,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 10,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [],
            "jobs": [],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 21,
          "intent": "ACTIVATE",
          "timestamp": 1642164701282,
          "key": -1
        }
      ]
    },
    {
      "index": 11,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [
              2251799813685256
            ],
            "jobs": [
              {
                "customHeaders": {
                  "what_to_do": "nothing"
                },
                "deadline": 1642165001282,
                "errorMessage": "",
                "elementInstanceKey": 2251799813685255,
                "processDefinitionKey": 2251799813685249,
                "bpmnProcessId": "one_task",
                "elementId": "do_something",
                "errorCode": "",
                "variables": {},
                "processInstanceKey": 2251799813685251,
                "worker": "zbctl",
                "retries": 2,
                "processDefinitionVersion": 1,
                "type": "do_something"
              }
            ],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 21,
          "position": 22,
          "intent": "ACTIVATED",
          "timestamp": 1642164701284,
          "key": 2251799813685257
        }
      ]
    },
    {
      "index": 12,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {},
            "deadline": -1,
            "errorMessage": "foobar",
            "elementInstanceKey": -1,
            "processDefinitionKey": -1,
            "bpmnProcessId": "",
            "elementId": "",
            "errorCode": "FOO",
            "variables": {},
            "processInstanceKey": -1,
            "worker": "",
            "retries": -1,
            "processDefinitionVersion": -1,
            "type": ""
          },
          "valueType": "JOB",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 23,
          "intent": "THROW_ERROR",
          "timestamp": 1642164800313,
          "key": 2251799813685256
        }
      ]
    },
    {
      "index": 13,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "customHeaders": {
              "what_to_do": "nothing"
            },
            "deadline": 1642165001282,
            "errorMessage": "foobar",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "errorCode": "FOO",
            "variables": {},
            "processInstanceKey": 2251799813685251,
            "worker": "zbctl",
            "retries": 2,
            "processDefinitionVersion": 1,
            "type": "do_something"
          },
          "valueType": "JOB",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 23,
          "position": 24,
          "intent": "ERROR_THROWN",
          "timestamp": 1642164800315,
          "key": 2251799813685256
        },
        {
          "partitionId": 1,
          "value": {
            "errorMessage": "Expected to throw an error event with the code 'FOO' with message 'foobar', but it was not caught. No error events are available in the scope.",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "processInstanceKey": 2251799813685251,
            "errorType": "UNHANDLED_ERROR_EVENT",
            "variableScopeKey": 2251799813685255,
            "jobKey": 2251799813685256
          },
          "valueType": "INCIDENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 23,
          "position": 25,
          "intent": "CREATED",
          "timestamp": 1642164800315,
          "key": 2251799813685258
        }
      ]
    },
    {
      "index": 14,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": -1,
            "bpmnProcessId": "",
            "elementId": "",
            "processInstanceKey": -1,
            "bpmnElementType": "UNSPECIFIED",
            "flowScopeKey": -1,
            "version": -1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 26,
          "intent": "CANCEL",
          "timestamp": 1642164832099,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 15,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 26,
          "position": 27,
          "intent": "TERMINATE_ELEMENT",
          "timestamp": 1642164832101,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 16,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 27,
          "position": 28,
          "intent": "ELEMENT_TERMINATING",
          "timestamp": 1642164832104,
          "key": 2251799813685251
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 27,
          "position": 29,
          "intent": "TERMINATE_ELEMENT",
          "timestamp": 1642164832104,
          "key": 2251799813685255
        }
      ]
    },
    {
      "index": 17,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 30,
          "intent": "ELEMENT_TERMINATING",
          "timestamp": 1642164832111,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "errorMessage": "Expected to throw an error event with the code 'FOO' with message 'foobar', but it was not caught. No error events are available in the scope.",
            "elementInstanceKey": 2251799813685255,
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "NO_CATCH_EVENT_FOUND",
            "processInstanceKey": 2251799813685251,
            "errorType": "UNHANDLED_ERROR_EVENT",
            "variableScopeKey": 2251799813685255,
            "jobKey": 2251799813685256
          },
          "valueType": "INCIDENT",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 31,
          "intent": "RESOLVED",
          "timestamp": 1642164832111,
          "key": 2251799813685258
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "do_something",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "SERVICE_TASK",
            "flowScopeKey": 2251799813685251,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 32,
          "intent": "ELEMENT_TERMINATED",
          "timestamp": 1642164832111,
          "key": 2251799813685255
        },
        {
          "partitionId": 1,
          "value": {
            "processDefinitionKey": 2251799813685249,
            "bpmnProcessId": "one_task",
            "elementId": "one_task",
            "processInstanceKey": 2251799813685251,
            "bpmnElementType": "PROCESS",
            "flowScopeKey": -1,
            "version": 1,
            "parentElementInstanceKey": -1,
            "parentProcessInstanceKey": -1
          },
          "valueType": "PROCESS_INSTANCE",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 29,
          "position": 33,
          "intent": "ELEMENT_TERMINATED",
          "timestamp": 1642164832111,
          "key": 2251799813685251
        }
      ]
    },
    {
      "index": 18,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [],
            "jobs": [],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "COMMAND",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": -1,
          "position": 34,
          "intent": "ACTIVATE",
          "timestamp": 1642164844508,
          "key": -1
        }
      ]
    },
    {
      "index": 19,
      "term": 1,
      "entries": [
        {
          "partitionId": 1,
          "value": {
            "truncated": false,
            "maxJobsToActivate": 1,
            "timeout": 300000,
            "jobKeys": [
              2251799813685256
            ],
            "jobs": [
              {
                "customHeaders": {
                  "what_to_do": "nothing"
                },
                "deadline": 1642165144508,
                "errorMessage": "foobar",
                "elementInstanceKey": 2251799813685255,
                "processDefinitionKey": 2251799813685249,
                "bpmnProcessId": "one_task",
                "elementId": "do_something",
                "errorCode": "FOO",
                "variables": {},
                "processInstanceKey": 2251799813685251,
                "worker": "zbctl",
                "retries": 2,
                "processDefinitionVersion": 1,
                "type": "do_something"
              }
            ],
            "worker": "zbctl",
            "type": "do_something"
          },
          "valueType": "JOB_BATCH",
          "recordType": "EVENT",
          "rejectionType": "NULL_VAL",
          "rejectionReason": "",
          "brokerVersion": "1.2.9",
          "sourceRecordPosition": 34,
          "position": 35,
          "intent": "ACTIVATED",
          "timestamp": 1642164844510,
          "key": 2251799813685259
        }
      ]
    }
  ]
}

Looking at this, I think this is pretty old bug, that became worse since 1.1.0. The Job cancel command should always be written during the termination of a service task, but the special incident logic for jobs has already removed the reference to the job from the service task and so it doesn't have any job to cancel. In the past that just meant that the job stayed in the database even though the instance was gone, but since #6000 this job is made activatable again.

Fixing the problem
First, we'll need to make sure to always cancel the job when a service task terminates.

Second, we'll also need to consider whether the incident should really be resolved when terminating the service task. Perhaps the incident is merely IGNORED (or something similar). By introducing a new incident intent we can make this difference clear and apply the state differently.

Third, this makes me think about how much logic we have in event appliers. Why does the incident resolved event lead to the job's state being changed? Wouldn't it be better to write a Job Suspended when the throw error is being processed? We shouldn't make assumption about what to do in applying the events, instead the events that are written should tell us what happened. Of course, this part is not necessary to resolve the issue, but I think it might improve things.

@korthout korthout changed the title Jobs can still be activated after their process instances have been cancelled Job of cancelled instance can be activated if an error was thrown on it Jan 14, 2022
@korthout
Copy link
Member

korthout commented Jan 14, 2022

@npepinpe Let's triage this

This happens for jobs where an error has been thrown on it that was not caught in the process and then led to an incident. If the process is cancelled instead of resolving the incident, then the job is made activatable again. This means a worker can pick it up and work on it while there is no process instance anymore.

The situation is quite specific, and is more likely to occur during development than in production due to the unhandled error, but it might happen. I'm quite certain the regression is introduced with 1.1.0, because of #6000.

@korthout korthout added scope/broker Marks an issue or PR to appear in the broker section of the changelog Impact: Regression labels Jan 14, 2022
@npepinpe
Copy link
Member

Well, considering you've already root caused this... 😄

To be clear, what happens when it's activated again and then completed/failed? Not much, I guess? So it's mostly taking up resources for no reason, correct?

@korthout
Copy link
Member

what happens when it's activated again and then completed/failed? Not much, I guess?

I'd need to check what happens internally, but there's not much the engine can do with it. However, the worker would have executed its work for that job and from the perspective of orchestration this is pretty bad.

@npepinpe
Copy link
Member

What's your estimate on the severity? I would class it as high, and have us work on this soon. You're correct that the impact is quite bad if applications perform work that was cancelled. Of course the impact depends on the application's use case, but it's critical that Zeebe remains trusted as an orchestration engine.

@npepinpe npepinpe added this to Ready in Zeebe Jan 14, 2022
@npepinpe npepinpe added the severity/high Marks a bug as having a noticeable impact on the user with no known workaround label Jan 14, 2022
@npepinpe
Copy link
Member

As you already looked into it, can you work on it? Do you think it's something we will be able to get into 1.3.1? I know you're busy next week...

@korthout
Copy link
Member

I should be able to find some time for it in week 4, but the solution ideas I describe above are still pretty vague. So I'm not sure it will be done in time for 1.3.1.

@korthout
Copy link
Member

korthout commented Jan 14, 2022

What's your estimate on the severity?

I might downgrade it to Mid, because it requires throwing an uncaught error in the process, which is less likely to happen in production systems than in development systems. Still the impact can be big if it does happen as you say: it's critical that Zeebe remains trusted as an orchestration engine

@npepinpe
Copy link
Member

npepinpe commented Jan 14, 2022

So we assume everyone will test their processes thoroughly before deploying to production? 😉 I get your point, but as the tooling for testing is very fresh, I don't know how likely it is, and when it does happen it's still pretty bad. I would keep it as high for now, especially since it's not clear how quickly users would realize this is happening in a production system to fix it (and Zeebe wouldn't automatically correct it).

It's fine if we miss 1.3.1, it was more hopeful than anything 🙂

@rusterholz
Copy link
Author

I updated the description with @korthout's vastly-simpler method for reproducing the issue. My original method is tucked away in a collapsible section there if anyone is looking for it.

@jerry123888
Copy link

So we assume everyone will test their processes thoroughly before deploying to production? 😉 I get your point, but as the tooling for testing is very fresh, I don't know how likely it is, and when it does happen it's still pretty bad. I would keep it as high for now, especially since it's not clear how quickly users would realize this is happening in a production system to fix it (and Zeebe wouldn't automatically correct it).

It's fine if we miss 1.3.1, it was more hopeful than anything 🙂

At present this problem has a great impact on us and has received many user complaints.
Do you have any plans to start? Is it convenient to tell me the detail start time ?Looking forward to your problem solving

@korthout
Copy link
Member

Hi @jerry123888 thanks for raising your concern. I've had to switch focus from this issue and it wasn't on my mind anymore. I'll assign myself to keep it at the top of my tasks.

This will not make the upcoming release, but I expect that we can find some time to fix this in an upcoming patch release.

@korthout korthout self-assigned this Mar 23, 2022
@KerstinHebel KerstinHebel removed this from Ready in Zeebe Mar 23, 2022
@npepinpe npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Regression labels Apr 11, 2022
korthout added a commit that referenced this issue Apr 22, 2022
This adds a regression test against activating jobs of cancelled process
instances, by making sure they are cancelled (i.e. deleted).

In this specific case, an uncaught error was thrown on a job which led
to an UNHANDLED_ERROR_EVENT incident. When the process instance is
cancelled, the job was not called, but instead was made activatable
again. This happened, because the incident was resolved as part of the
process instance termination logic. Instead, we should guarantee that
the job is cancelled. A cancelled job can no longer be made
re-activatable. This is a terminal state for jobs, because a cancelled
job is a deleted job.

See #8588 for more details.
korthout added a commit that referenced this issue Apr 28, 2022
This adds a regression test against activating jobs of cancelled process
instances, by making sure they are cancelled (i.e. deleted).

In this specific case, an uncaught error was thrown on a job which led
to an UNHANDLED_ERROR_EVENT incident. When the process instance is
cancelled, the job was not called, but instead was made activatable
again. This happened, because the incident was resolved as part of the
process instance termination logic. Instead, we should guarantee that
the job is cancelled. A cancelled job can no longer be made
re-activatable. This is a terminal state for jobs, because a cancelled
job is a deleted job.

See #8588 for more details.
zeebe-bors-camunda bot added a commit that referenced this issue Apr 28, 2022
9219: Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

This fixes a bug where canceling a process instance could lead to a job becoming activatable even though the process instance is terminated. See #8588 for more details.

The bug is fixed by making sure the job is immediately canceled (i.e. deleted from the state) when we terminate the related element instance. We achieve this by writing a Job Canceled event when the TERMINATE_ELEMENT command is being processed. 

This PR fixes the bug in a way that can be backported because it doesn't introduce too many changes. Alternatively, the bug could be fixed by tackling the main issue: applying the Incident Resolved event should not affect the state of a job. Instead, the job should be made activatable by writing a separate Job event in the Resolve Incident Processor. That would be a larger change, with more impact on the event appliers than what this PR proposes. Changes to event appliers should generally be avoided.

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #8588 



Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
korthout added a commit that referenced this issue Apr 28, 2022
This adds a regression test against activating jobs of cancelled process
instances, by making sure they are cancelled (i.e. deleted).

In this specific case, an uncaught error was thrown on a job which led
to an UNHANDLED_ERROR_EVENT incident. When the process instance is
cancelled, the job was not called, but instead was made activatable
again. This happened, because the incident was resolved as part of the
process instance termination logic. Instead, we should guarantee that
the job is cancelled. A cancelled job can no longer be made
re-activatable. This is a terminal state for jobs, because a cancelled
job is a deleted job.

See #8588 for more details.

(cherry picked from commit d43c344)
korthout added a commit that referenced this issue Apr 28, 2022
This adds a regression test against activating jobs of cancelled process
instances, by making sure they are cancelled (i.e. deleted).

In this specific case, an uncaught error was thrown on a job which led
to an UNHANDLED_ERROR_EVENT incident. When the process instance is
cancelled, the job was not called, but instead was made activatable
again. This happened, because the incident was resolved as part of the
process instance termination logic. Instead, we should guarantee that
the job is cancelled. A cancelled job can no longer be made
re-activatable. This is a terminal state for jobs, because a cancelled
job is a deleted job.

See #8588 for more details.

(cherry picked from commit d43c344)
zeebe-bors-camunda bot added a commit that referenced this issue Apr 28, 2022
9250: [Backport 1.3] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219 

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Apr 28, 2022
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588 


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Apr 28, 2022
9251: [Backport 8.0] Cancel job with incident when canceling the process instance r=korthout a=korthout

## Description

<!-- Link to the PR that is back ported -->

Backport of #9219

## Related issues

<!-- Link to the related issues of the origin PR -->

closes #8588 


Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
@korthout
Copy link
Member

@rusterholz The issue is fixed. The fix will be available in the upcoming patch releases 8.0.2 and 1.3.8.

@deepthidevaki deepthidevaki added the version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 label May 3, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:1.3.8 version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants