Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too big Deployment is no longer rejected #9946

Closed
korthout opened this issue Aug 1, 2022 · 7 comments · Fixed by #10193
Closed

Too big Deployment is no longer rejected #9946

korthout opened this issue Aug 1, 2022 · 7 comments · Fixed by #10193
Assignees
Labels
area/observability Marks an issue as observability related area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) gameday Marks an issue or PR as related to an experiment performed during a gameday kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@korthout
Copy link
Member

korthout commented Aug 1, 2022

Describe the bug

When I deploy a very large resource, then the request times out and the partition ends up in an error loop.

We discovered this bug while experimenting with a known bug for a game day:

In the past, the user would see a rejection message as described by that issue (like this) when deploying a resource that is too large. However, that is no longer the case in version 8.1.0-alpha4.

Note that this issue differs from #5776 because that specifically discusses problems related to the distribution of a large Deployment, while this is about the deployment itself.

To Reproduce

Deploy a very large resource (like a process with lots of data in XML comments or in the documentation tag):

zbctl deploy resource very-large-process.bpmn

Expected behavior

The client receives a rejection, and the partition stays healthy.

Log/Stacktrace

Full Stacktrace

17:22:57.107 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] ERROR io.camunda.zeebe.processor - Expected to write one or more follow-up records for record 'LoggedEvent [type=0, version=0, streamId=1, position=1, key=-1, timestamp=1659367373119, sourceEventPosition=-1] RecordMetadata{recordType=COMMAND, intentValue=255, intent=CREATE, requestStreamId=1, requestId=0, protocolVersion=3, valueType=DEPLOYMENT, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=8.1.0}' without errors, but exception was thrown.
java.lang.IllegalArgumentException: Expected to claim segment of size 283324832, but can't claim more than 4194304 bytes.
	at io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:207) ~[classes/:?]
	at io.camunda.zeebe.dispatcher.Dispatcher.claimFragmentBatch(Dispatcher.java:164) ~[classes/:?]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.claimBatchForEvents(LogStreamBatchWriterImpl.java:230) ~[classes/:?]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.tryWrite(LogStreamBatchWriterImpl.java:207) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.LegacyTypedStreamWriterImpl.flush(LegacyTypedStreamWriterImpl.java:112) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.LegacyTypedStreamWriterProxy.flush(LegacyTypedStreamWriterProxy.java:81) ~[classes/:?]
	at io.camunda.zeebe.streamprocessor.DirectProcessingResult.writeRecordsToStream(DirectProcessingResult.java:46) ~[classes/:?]
	at io.camunda.zeebe.streamprocessor.ProcessingStateMachine.lambda$writeRecords$7(ProcessingStateMachine.java:342) ~[classes/:?]
	at io.camunda.zeebe.scheduler.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[classes/:?]
	at io.camunda.zeebe.scheduler.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:45) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:92) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) ~[classes/:?]
17:22:57.113 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] ERROR io.camunda.zeebe.processor - Expected to process record 'TypedRecordImpl{metadata=RecordMetadata{recordType=COMMAND, intentValue=255, intent=CREATE, requestStreamId=1, requestId=0, protocolVersion=3, valueType=DEPLOYMENT, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=8.1.0}, value={"resources":[{"resourceName":"rick.bpmn","resource":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeG1sbnM6ZGk9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RJIiB4bWxuczptb2RlbGVyPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL21vZGVsZXIvMS4wIiBpZD0iRGVmaW5pdGlvbnNfMXB3YXBxbSIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IkNhbXVuZGEgTW9kZWxlciIgZXhwb3J0ZXJWZXJzaW9uPSI1LjEuMCIgbW9kZWxlcjpleGVjdXRpb25QbGF0Zm9ybT0iQ2FtdW5kYSBDbG91ZCIgbW9kZWxlcjpleGVjdXRpb25QbGF0Zm9ybVZlcnNpb249IjEuMS4wIj4KICA8YnBtbjpwcm9jZXNzIGlkPSJ6ZWViZS1yZWxlYXNlLXFhIiBuYW1lPSJaZWViZSBSZWxlYXNlIFFBIiBpc0V4ZWN1dGFibGU9InRydWUiPgogICAgPGJwbW46ZG...}' without errors, but exception occurred with message 'Expected to claim segment of size 283324832, but can't claim more than 4194304 bytes.'.
java.lang.IllegalArgumentException: Expected to claim segment of size 283324832, but can't claim more than 4194304 bytes.
	at io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:207) ~[classes/:?]
	at io.camunda.zeebe.dispatcher.Dispatcher.claimFragmentBatch(Dispatcher.java:164) ~[classes/:?]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.claimBatchForEvents(LogStreamBatchWriterImpl.java:230) ~[classes/:?]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.tryWrite(LogStreamBatchWriterImpl.java:207) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.LegacyTypedStreamWriterImpl.flush(LegacyTypedStreamWriterImpl.java:112) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.LegacyTypedStreamWriterProxy.flush(LegacyTypedStreamWriterProxy.java:81) ~[classes/:?]
	at io.camunda.zeebe.streamprocessor.DirectProcessingResult.writeRecordsToStream(DirectProcessingResult.java:46) ~[classes/:?]
	at io.camunda.zeebe.streamprocessor.ProcessingStateMachine.lambda$writeRecords$7(ProcessingStateMachine.java:342) ~[classes/:?]
	at io.camunda.zeebe.scheduler.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[classes/:?]
	at io.camunda.zeebe.scheduler.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:45) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:92) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) ~[classes/:?]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) ~[classes/:?]

Environment:

  • Zeebe Version: 8.1.0-alpha4
@korthout korthout added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) area/observability Marks an issue as observability related gameday Marks an issue or PR as related to an experiment performed during a gameday labels Aug 1, 2022
@korthout
Copy link
Member Author

korthout commented Aug 1, 2022

I've marked this severity/high but it could be considered severity/critical as the deployment partition is blown after this happens. I've marked it as high because no stable release has been published yet that contains this bug.

@menski
Copy link
Contributor

menski commented Aug 5, 2022

Goal is to fix it until 8.1.0 release

@remcowesterhoud
Copy link
Contributor

remcowesterhoud commented Aug 16, 2022

@korthout How did you reproduce this? When I deploy a large resource I still get a rejection on the client side:

Error: rpc error: code = Internal desc = Command 'CREATE' rejected with code 'PROCESSING_ERROR': Expected to process record 'TypedRecordImpl{metadata=RecordMetadata{recordType=COMMAND, intentValue=255, intent=CREATE, requestStreamId=1, requestId=0, protocolVersion=3, valueType=DEPLOYMENT, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=8.1.0}, value={"resources":[{"resourceName":"diagram_2.bpmn","resource":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGJwbW46ZGVmaW5pdGlvbnMgeG1sbnM6YnBtbj0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9NT0RFTCIgeG1sbnM6YnBtbmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9CUE1OLzIwMTAwNTI0L0RJIiB4bWxuczpkYz0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREMiIHhtbG5zOmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9ERC8yMDEwMDUyNC9ESSIgeG1sbnM6bW9kZWxlcj0iaHR0cDovL2NhbXVuZGEub3JnL3NjaGVtYS9tb2RlbGVyLzEuMCIgaWQ9IkRlZmluaXRpb25zXzB4dXlreTAiIHRhcmdldE5hbWVzcGFjZT0iaHR0cDovL2JwbW4uaW8vc2NoZW1hL2JwbW4iIGV4cG9ydGVyPSJDYW11bmRhIE1vZGVsZXIiIGV4cG9ydGVyVmVyc2lvbj0iNS4yLjAiIG1vZGVsZXI6ZXhlY3V0aW9uUGxhdGZvcm09IkNhbXVuZGEgQ2xvdWQiIG1vZGVsZXI6ZXhlY3V0aW9uUGxhdGZvcm1WZXJzaW9uPSI4LjAuMCI+CiAgPGJwbW46cHJvY2VzcyBpZD0iUHJvY2Vzc18wdzNnMzFlIiBpc0V4ZWN1dGFibGU9InRydWUiPgogICAgPGJwbW46c3RhcnRFdmVudCBpZD0iU3RhcnRFdmVudF8xIj4KICAgICAgPGJwbW46b3V0Z29pbmc+Rmxvd18weTd1NXAwPC9icG1uOm91dGdvaW5nPgogICAgPC9icG1uOnN0YXJ0RXZlbnQ+CiAgICA8YnBtbjplbmRFdmVudCBpZD0iR...}' without errors, but exception occurred with message 'Expected to claim segment of size 4850056, but can't claim more than 4194304 bytes.'.

Internally on the broker I can see the IllegalArgumentException is thrown as you mentioned, but the user doesn't see anything of this. To me this is the expected behavior.

@korthout
Copy link
Member Author

@remcowesterhoud Are you on 8.1.0-alpha4? Perhaps something changed in the meantime.

@korthout
Copy link
Member Author

We shouldn't throw an IllegalArgumentException, because that would be an unexpected error. Instead we should handle the error by rejecting the command.

@remcowesterhoud
Copy link
Contributor

8.1.0-alpha4 gives me the error loop. Seems something was fixed in the meantime so that's good 😄

I'm not sure where to catch this error tbh as it comes from some random actor. Do you want to have a look at it together @korthout ?

@remcowesterhoud
Copy link
Contributor

I'm not sure why I didn't see the problem last time, maybe my branch was outdated. However, I can fully reproduce the issue with the partition dying. I shall create a fix for this today.

@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
korthout added a commit that referenced this issue Jul 4, 2023
korthout added a commit that referenced this issue Jul 4, 2023
oleschoenburg pushed a commit that referenced this issue Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) gameday Marks an issue or PR as related to an experiment performed during a gameday kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants