-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception loop in Engine during handling of unexpected error #10199
Comments
Possible solution When we require a more detailed message we can do that by handling the exception in the processor like we did in #10193 |
@Zelldon I'd be interested on your thoughts on how we could handle this. |
Hey @remcowesterhoud yes so my recommendation would also be to verify whether the appending worked via the Maybe you could also check this already on the appending of the DeploymentRecord during processing and handle that case if it is too large 🤔 But I guess with the exception is right now a bit cleaner, since it will reset the transaction, result builder etc. |
For completeness, this is the |
10402: Fix error handling loop r=remcowesterhoud a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> When we exceed the record batch an exception is thrown. When this is done during the handling of an error this would result in the writing of a rejection to fail, in turn resulting in an exception-loop. By removing the rejection reason from the rejection in the event of an `ExceededBatchRecordSizeException` the rejection should always be able to be written. Unfortunately removing this reason makes it unclear what went wrong. To still be able to identify this a new rejection type has been added. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #10199 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
Describe the bug
In #9946 we encountered the problem of an exception loop when we deployed a process that was too large. The cause for this that the processor did not handle this exception explicitly using the
tryHandleError
method. As a result the engine considered this anUNEXPECTED_ERROR
. When this happens the engine will try to handle this unexpected error:At the beginning of this method we will create an error message and write the rejection. During the appending of this rejection we check the size of this record (https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/engine/api/records/RecordBatch.java#L58). If it's also too large we will throw an exception.
Since this exception is also considered to be an
UNEXPECTED_ERROR
the Engine tries to handle this unexpected error again. This will try to append the record again and in turn thrown another exception, starting the loop again.This loop will cause the partition to reach an UNHEALTHY state.
To Reproduce
As long as #10193 is not merged:
If #10193 is merged:
DeploymentCreateProcessor
tryHandleError
method to always return anUNEXPECTED_ERROR
Expected behavior
The exception loop should not occur when the Engine fails to append a rejection.
Log/Stacktrace
Full Stacktrace
Environment:
The text was updated successfully, but these errors were encountered: