-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject too large deployment #10193
Reject too large deployment #10193
Conversation
@korthout Don't be alarmed about the large number of lines added, it's actually a relatively small fix. That's just a large process used in the test 😄 |
.../io/camunda/zeebe/engine/processing/deployment/model/validation/DeploymentRejectionTest.java
Fixed
Show fixed
Hide fixed
.../io/camunda/zeebe/engine/processing/deployment/model/validation/DeploymentRejectionTest.java
Fixed
Show fixed
Hide fixed
.../io/camunda/zeebe/engine/processing/deployment/model/validation/DeploymentRejectionTest.java
Fixed
Show fixed
Hide fixed
.../io/camunda/zeebe/engine/processing/deployment/model/validation/DeploymentRejectionTest.java
Fixed
Show fixed
Hide fixed
1998f3f
to
e6a00ec
Compare
@Test | ||
public void shouldRejectDeploymentIfNotValidDesignTimeAspect() throws Exception { | ||
// given | ||
final Path path = Paths.get(getClass().getResource("/processes/invalid_process.bpmn").toURI()); |
Check warning
Code scanning / CodeQL
Unsafe use of getResource
public void shouldRejectDeploymentIfNotValidRuntimeAspect() throws Exception { | ||
// given | ||
final Path path = | ||
Paths.get(getClass().getResource("/processes/invalid_process_condition.bpmn").toURI()); |
Check warning
Code scanning / CodeQL
Unsafe use of getResource
@Test | ||
public void shouldRejectDeploymentIfOneResourceIsNotValid() throws Exception { | ||
// given | ||
final Path path1 = Paths.get(getClass().getResource("/processes/invalid_process.bpmn").toURI()); |
Check warning
Code scanning / CodeQL
Unsafe use of getResource
public void shouldRejectDeploymentIfOneResourceIsNotValid() throws Exception { | ||
// given | ||
final Path path1 = Paths.get(getClass().getResource("/processes/invalid_process.bpmn").toURI()); | ||
final Path path2 = Paths.get(getClass().getResource("/processes/collaboration.bpmn").toURI()); |
Check warning
Code scanning / CodeQL
Unsafe use of getResource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done @remcowesterhoud 👍
👍 Thanks for the integration test! That's a good idea!
👍 I also really like the split of the commits, it is easy to review
I'm not immediately approving this but I also don't necessarily request changes. Instead, I'd like to discuss the main solution approach first. (see my 🤔 comment below).
Perhaps this is a very specific problem to deployments, but I'd like to understand how this works for other records that exceed the max msg size. If it's solved there, why does it not work for deployment? If it isn't solved there, then we should probably look for a more general solution. What do you think @remcowesterhoud Do you agree?
@Override | ||
public ProcessingError tryHandleError( | ||
final TypedRecord<DeploymentRecord> command, final Throwable error) { | ||
if (error instanceof IllegalStateException | ||
&& error.getMessage().contains("this would exceed the maximum batch size")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Should we handle this as an expected error specifically for deployments? It can happen to any record that becomes too large. How do we deal with this for job batch activate with variables? or for set variables command? We may need to consider a more general solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to mention that there is a new feature right out of the ofen #10191 Where you can determine whether the append to the RecordBatch is successful or not. We can discuss in more detail when I'm back, but in general you could think of whether you would like to handle this either as Remco did via exception or directly on the processing and use the either.
Important are these new methods for you https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/engine/api/ProcessingResultBuilder.java#L22-L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be aware of records that could be too large. I think for now this would only be a risk for deployments and batch activations as you said.
The job batch activate does checks this explicitly here: https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/engine/processing/job/JobBatchCollector.java#L95.
I think the important thing is that when we write a command rejection, we make sure don't get into an error loop again. This will happens because we receive the "too large" error when processing the rejection as well. Since this is an unexpected error we will handle this and write a rejection because of it, causing the loop here:
https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/engine/Engine.java#L167-L174
I still have to create a follow-up issue for this, I'll do that later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up issue: #10199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I understand correctly @remcowesterhoud you suggest picking this up later and first patch the bug. I think that makes sense.
Personally, I'm not so happy that the DeploymentCreateProcessor is handling an error as expected that it didn't throw. But, that's all good if we make this is a temporary solution and aim to resolve #10199 at a later point.
Did I understand your intention correctly, @remcowesterhoud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@korthout Almost right. The only difference is that I don't see this as a temporary solutions. I intend to keep this code here, next to the changes that will be made in #10199, which I will probably make next week.
I think the misunderstanding here is that the processor will be handling an error that it hasn't thrown. The very first error that triggers the loop is thrown in this processor. The first error happens when the processor tries to append the deployment CREATED event here https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/engine/processing/deployment/DeploymentCreateProcessor.java#L116.
After this the loop begins and all the consecutive errors are not thrown in this processor, but in the Engine.
In my opinion this is definitely an expected error from a deployment. We know users can upload large processes and we should expect it to happen. Handling it here allows us to give more significant feedback to the user than the generic message of #10199 will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem that I have with this is that the IllegalStateException is thrown by the RecordBatch, which we don't own. The processor now depends on this specific exception being thrown, but it may be changed by the other team. To make sure they don't change it, we need to make sure that this exception is part of the contract when we append records.
After discussing F2F, @saig0 @remcowesterhoud and I decided we want to:
- change the exception thrown by RecordBatch into a more meaningful one
- document that throwing this exception is part of the contract with the engine (so DPT doesn't change this);
- document that this exception can be thrown as part of the contract of the appendX methods of the record writers (for visibility of the exception in the engine processors);
- handle this new exception in the tryHandleError method of the processor (we can later expand this to a separate abstract
handleExceededBatchRecordSize
(or similar) method for further improved visibility in the processors).
All this means that the processor still doesn't throw the exception directly, but it's now part of the contract of appending records. Therefore, the processor is aware that this can happen and it should deal with it accordingly.
Note that we should also update our internal developer handbook on how to write record processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at #10191
You can use either the ResultBuilder method which returns an either or which throws an exceptions. As far as I remember correctly I added it as Javadoc, can't check it right now since I'm underway.
We can also check together tomorrow if you need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zelldon The problem with using the Either is that we would have to explicitly check the append result for every append we do, since all of the appends could result in this problem. It's easier to have it throw the exception and check it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I get that just wanted to show the option. Btw I think in some cases it is more likely to fail, multi instance, deployments etc maybe it makes sense to handle here the either just as an idea 🤷♂️
...e/src/main/java/io/camunda/zeebe/engine/processing/deployment/DeploymentCreateProcessor.java
Show resolved
Hide resolved
engine/src/test/java/io/camunda/zeebe/engine/processing/deployment/CreateDeploymentTest.java
Outdated
Show resolved
Hide resolved
When a deployment was too large an unexpected exception would be thrown. This would cause a rejection to be written with an error message containing the very large resource. The writing of this rejection failed because it was too large as well, causing another rejection to be written. This kept happening in a loop. By manually writing the rejection in the processor we can give a custom message that isn't too large, resolving the loop.
Verify that a deployment that is too large gets rejected properly. A rejection must be written and the client must receive a rejection response.
The CreateDeploymentTest became a bit large. I have moved the rejections to their own test class.
Instead of a generic IllegalArgumentException the RecordBatch will now throw a more explicit ExceedBatchRecordSizeException. This way it's easier to catch the exception in the event something goes wrong.
e6a00ec
to
6fe4a77
Compare
@korthout your review is still suggested so I'll ping you like this. Please have another look at the changed I made (last 2 commits) 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @remcowesterhoud 👍 I like it!
Please have a look at my suggestions before merging
engine/src/main/java/io/camunda/zeebe/engine/api/records/RecordBatch.java
Outdated
Show resolved
Hide resolved
engine/src/main/java/io/camunda/zeebe/engine/api/ProcessingResultBuilder.java
Show resolved
Hide resolved
public static class ExceededBatchRecordSizeException extends RuntimeException { | ||
public ExceededBatchRecordSizeException(final String message) { | ||
super(message); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 The message is unused but it is filled with useful data. Let's move this data into the exception and use it to enhance the rejection message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured we could add this when we needed it. I don't see the point in adding it in case we might need it in the future.
Explicitly mentions that this exception can be thrown when appending a record. Also document that this exception is part of the contract of the engine.
6fe4a77
to
bed92c8
Compare
bors merge |
Build succeeded:
|
Description
When a deployment was too large an unexpected exception would be thrown. This would cause a rejection to be written with an error message containing the very large resource. The writing of this rejection failed because it was too large as well, causing another rejection to be written. This kept happening in a loop.
By manually writing the rejection in the processor we can give a custom message that isn't too large, resolving the loop.
The bug first occurred in 8.1.0-alpha4, so a backport is not necessary.
Related issues
closes #9946
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.