Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on fetchAndLock long poll (asyncResponseTimeout waits until timeout) #438

Closed
datakurre opened this issue Oct 25, 2022 · 11 comments

Comments

@datakurre
Copy link

datakurre commented Oct 25, 2022

The last version where asyncResponseTimeout for REST API external task long poll works properly is 2.7.2

Minimal working demo: demo-2.7.2.zip

Here I start a long poll and then start a new instance of a process and get response immediately after starting the instance, which is much less than 10 seconds of asyncResponseTimeout:

$ date --iso-8601=ns && curl -X POST http://localhost:8080/engine-rest/external-task/fetchAndLock -H "Content-Type: application/json" -d'{"maxTasks": 1, "workerId": "aWorkerId", "asyncResponseTimeout": 10000, "topics": [{"topicName": "Example", "lockDuration": 10000}]}' && printf "\n" && date --iso-8601=ns
2022-10-25T22:21:11,938346745+03:00
[{"activityId":"Activity_1d1ps3k","activityInstanceId":"Activity_1d1ps3k:31db367d-549a-11ed-a352-0242a7641d86","errorMessage":null,"errorDetails":null,"executionId":"31db367c-549a-11ed-a352-0242a7641d86","id":"31dbd2be-549a-11ed-a352-0242a7641d86","lockExpirationTime":"2022-10-25T22:21:25.683+0300","processDefinitionId":"Example:1:31bd0019-549a-11ed-a352-0242a7641d86","processDefinitionKey":"Example","processDefinitionVersionTag":null,"processInstanceId":"31d961ba-549a-11ed-a352-0242a7641d86","retries":null,"suspended":false,"workerId":"aWorkerId","topicName":"Example","tenantId":null,"variables":{},"priority":0,"businessKey":"default","extensionProperties":{}}]
2022-10-25T22:21:15,711268318+03:00

Minimal failing demo: demo-2.10.0.zip

Here I do the same, but I only get the response after asyncResponseTimeout of 10000 has exceeded:

$ date --iso-8601=ns && curl -X POST http://localhost:8080/engine-rest/external-task/fetchAndLock -H "Content-Type: application/json" -d'{"maxTasks": 1, "workerId": "aWorkerId", "asyncResponseTimeout": 10000, "topics": [{"topicName": "Example", "lockDuration": 10000}]}' && printf "\n" && date --iso-8601=ns
2022-10-25T22:22:52,782968742+03:00
[{"activityId":"Activity_1d1ps3k","activityInstanceId":"Activity_1d1ps3k:6df40a98-549a-11ed-86ec-0242a7641d86","errorMessage":null,"errorDetails":null,"executionId":"6df40a97-549a-11ed-86ec-0242a7641d86","id":"6df458b9-549a-11ed-86ec-0242a7641d86","lockExpirationTime":"2022-10-25T22:23:13.015+0300","processDefinitionId":"Example:1:6dd92f94-549a-11ed-86ec-0242a7641d86","processDefinitionKey":"Example","processDefinitionVersionTag":null,"processInstanceId":"6df235d5-549a-11ed-86ec-0242a7641d86","retries":null,"suspended":false,"workerId":"aWorkerId","topicName":"Example","tenantId":null,"variables":{},"priority":0,"businessKey":"default","extensionProperties":{}}]
2022-10-25T22:23:03,038642485+03:00
@datakurre
Copy link
Author

This time I remembered to build and run demo projects from JAR files (because asyncResponseTimeout does not work on Maven project otherwise, as discussed in the original asyncResponseTimeout issue long time ago).

@tobiasschaefer
Copy link
Collaborator

Hi @datakurre ,
Just to be sure: release 2.8.0 fails for you too, right?

@datakurre
Copy link
Author

@tobiasschaefer Yes. I started from 2.10.0 and tried all versions down to 2.7.2, which worked.

@tobiasschaefer
Copy link
Collaborator

The broken version 2.8.0 basically only contains one commit which zu updates Micronaut from 3.4.4 to 3.5.2: eb51252

I'll take a closer look.

@datakurre
Copy link
Author

I double checked that the issue existed already on 2.8.0. It does.

@datakurre
Copy link
Author

@tobiasschaefer I spent some time with debugger and found the following difference between 2.7.2 and 2.8.0

When a process enters external service task, the following transaction listener is fired in 2.7.2 but no longer in 2.8.0 (the method is called to register the listener, but it never fires), and therefore long poll request does not get the signal about new task and waits until the timeout:

https://github.com/camunda/camunda-bpm-platform/blob/13756c32033cf820ae48553c5116fa16d20c8fd9/engine/src/main/java/org/camunda/bpm/engine/impl/persistence/entity/ExternalTaskManager.java#L179

@tobiasschaefer
Copy link
Collaborator

Thanks for that detailed analysis. It must be related to Micronaut's transaction handling which changed in 2.8.0.

I'll have a look.

tobiasschaefer added a commit to tobiasschaefer/micronaut-camunda-bpm that referenced this issue Nov 11, 2022
@datakurre
Copy link
Author

🤩

@datakurre
Copy link
Author

datakurre commented Nov 11, 2022

Thanks to this, I know now much better, how Camunda long polling works. I was not aware that only new tasks are immediately delivered, because of that event being triggered. Tasks with silently expiring lock, are only redelivered after asyncResponseTimeout (or with tasks that fired that event). Therefore, optimal asyncResponseTimeout might be surprisingly short. A compromise between not polling all the time, but poll often enough to also deliver those exceptions in timely manner.

@tobiasschaefer
Copy link
Collaborator

tobiasschaefer commented Nov 11, 2022

Hey @datakurre ,

can you please test release v2.11.1. It fixes the issue for me.

Note to myself:

  • Create log poll after new start of application: time curl -X POST http://localhost:8080/engine-rest/external-task/fetchAndLock -H "Content-Type: application/json" -d'{"maxTasks": 1, "workerId": "aWorkerId", "asyncResponseTimeout": 10000, "topics": [{"topicName": "Example", "lockDuration": 10000}]}'
  • In a second terminal trigger the new process instance: curl -X POST http://localhost:8080/engine-rest/process-definition/key/Example/start -d '{}' -H 'Content-Type: application/json'
  • The call in the first terminal should be unlocked, i.e. return, before the 10 seconds timeout.

Process Model:
diagram_1.bpmn.txt

@datakurre
Copy link
Author

@tobiasschaefer I can confirm the fix. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants