Skip to content

Error Guidelines

Nicolas Pepin-Perreault edited this page Jan 14, 2019 · 11 revisions

Error Guidelines

Messages

For an error to be useful in any way, it must be clear to the user (who may or may not be Zeebe developer) what went wrong, and whenever possible, how it can be corrected. To achieve this, error messages should conform to the following pattern of Expected [EXPECTED], but got [ACTUAL] [in CONTEXT]. The wording may be different, but it should always be clear what we expected to happen, and what happened instead. Context information may be added if it is relevant to resolve the issue.

NOTE: context information does not mean debug information, but strictly relevant execution context information. For example, when trying to deploy a workflow to a non-existing partition, we might pass the actual deploy partition ID as context.

Some examples:

  • Expected to update retries, but got -2; should be a positive number
  • Unexpected partition broker role FOO, should be one of [LEADER, FOLLOWER]

Rethrowing checked exceptions

When rethrowing checked errors as runtime errors for which we did not write the message, we should still expect that it may end up being returned to the client. We should then write a clear error message which conforms to our general guidelines, and add the original exception as the cause. In these cases, the error is most likely not resolvable by the user who reads it, but it should be clear to them that it the error is not a client one.

For example, if rethrowing a RocksDBException, we could write Broker exception occurred: expected to put <VALUE> to <KEY> in DB, but operation failed unexpectedly., with the exception as cause.

In this instance, it is not especially relevant to a client user why the database operation failed, but they should still know what went wrong. The cause will be logged on the broker side for developers to properly address the issue.

Client errors

The following is a set of guidelines when generating errors that can be reported, either directly (e.g. command rejections) or indirectly (e.g. an uncaught exception during control message processing which is propagated to the client).

Error types

Two error types may be returned to the client: broker errors, and command rejections.

Rejections

Command rejections should be returned only when the client tried to execute a command, and the command was syntactically valid, but could not be processed. For example:

  • one or more invalid arguments (e.g. integer should be positive but was negative)
  • a required entity does not exist (e.g. complete a non existing job)
  • attempting to create an entity which already exists (e.g. publish a message with an ID that was already published)
  • expected a required entity to be in a certain state but it is not (e.g. create workflow instance on a workflow which has message start event)

Command rejections should not be returned for exceptional errors (e.g. NullPointerException) or errors not directly related to the command entity (e.g. command is DeployWorkflow but the deploy partition is not found).

Command rejections are returned by writing the record back with its recordType set to RecordType.COMMAND_REJECTION and filling out the rejectionReason property.

For example, when trying to create a workflow instance for a non existing workflow:

Expected to create instance of workflow 4 but no such workflow exists.

If applicable (and not immediately obvious), you should also include steps on how to fix the error after the reason.

For example, if a client tries to complete a failed job:

Expected job 1 to be activated but it is marked as failed. Incident 1 must be resolved before attempting again.

Exceptions

Exceptions (ErrorResponse SBE message) should be returned for any error that is not a command rejection. For example, Alice tries to publish a message to partition 4, which does not exist (for whatever reason). The broker will return an error that the partition does not exist, but this was not directly related to the command processing, as the command (PublishMessage) was never actually processed.

Exceptions are returned by writing by an ErrorResponse back to the client.

Expected to publish message with correlation key 'order-123' to partition 4, but no such partition exist in list of known partitions 1, 2, 3.