Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clients can set an optional backoff time to the failed tasks. #5629

Closed
aivinog1 opened this issue Oct 19, 2020 · 10 comments · Fixed by #9417
Closed

Clients can set an optional backoff time to the failed tasks. #5629

aivinog1 opened this issue Oct 19, 2020 · 10 comments · Fixed by #9417
Labels
kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/clients-go Marks an issue or PR to appear in the Go client section of the changelog scope/clients-java Marks an issue or PR to appear in the Java client section of the changelog version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@aivinog1
Copy link
Contributor

Is your feature request related to a problem? Please describe.
This is the client's part of #164. The idea is that a client (Java/Go/Zbctl) can specify an optional backoff time when sending a JobFailed command. If a backoff time is set then the job will be available after the backoff time. Otherwise, the job will be available immediately (current behavior).

Describe the solution you'd like
Clients can set an optional backoff time.

Describe alternatives you've considered
We could split this task into three for different clients.

@aivinog1 aivinog1 added the kind/feature Categorizes an issue or PR as a feature, i.e. new behavior label Oct 19, 2020
@saig0 saig0 added scope/clients-java Marks an issue or PR to appear in the Java client section of the changelog scope/clients-go Marks an issue or PR to appear in the Go client section of the changelog Status: Ready and removed Status: Needs Triage labels Oct 23, 2020
@tfnick
Copy link

tfnick commented Apr 30, 2021

this feature is useful

@aivinog1
Copy link
Contributor Author

aivinog1 commented Dec 19, 2021

Hey @saig0, @npepinpe! I have a question about implementation. So, when I try to use:

CLIENT_RULE
        .getClient()
        .newFailCommand(jobKey)
        .retries(1)
        .retryBackoff(backoffTimeout)
        .requestTimeout(Duration.ofDays(1))
        .send()
        .join();

in this test, the synchronous call became locked for the whole backoffTimeout duration. I think that this happens because we are expecting some response (JobRecord with a fail) and we will wait for it. I don't think that this is the right behavior but maybe I'm wrong 🤔

@npepinpe
Copy link
Member

👀 I wasn't really involved so far, so I'd have to look into what this means on Monday.

@B-hamza
Copy link

B-hamza commented Apr 25, 2022

Oh, that will be cool to set a backoff from the client on failing jobs, is there any update on this issue ? Many thanks.

@saig0
Copy link
Member

saig0 commented May 6, 2022

@aivinog1 do you want to continue working on the topic?
I would like to see the feature in version 8.1 (Q3 2022). If you don't want or can't work on it then I would assign it to someone else.

@aivinog1
Copy link
Contributor Author

aivinog1 commented May 6, 2022

Hey @saig0!
Sure, I'm happy to work on it soon :) But can you help me with this question? :)

@saig0
Copy link
Member

saig0 commented May 9, 2022

@aivinog1 I'm sorry for overseeing your question 🙈

Do you have your changes on a branch? Please share a link. I can't reproduce the behavior otherwise.

In general, the request waits until the response is received (when using .send().join()). But the processor should also send the response if a backoff is set.

@aivinog1
Copy link
Contributor Author

@aivinog1
Copy link
Contributor Author

@saig0 I think that I found the problem. So I'll need to do flush on the responseWriter somehow. I've tried to add responseWriter.flush(); like this, and it works but breaks other cases 🤔 :

17:05:17.324 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] ERROR io.camunda.zeebe.processor - Expected to execute side effects for record 'LoggedEvent [type=0, version=0, streamId=1, position=5, key=-1, timestamp=1652522717321, sourceEventPosition=-1] RecordMetadata{recordType=COMMAND, intentValue=255, intent=CREATE, requestStreamId=1, requestId=1, protocolVersion=3, valueType=PROCESS_INSTANCE_CREATION, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=8.1.0}' successfully, but exception was thrown.
java.lang.NullPointerException: null
	at java.util.Objects.requireNonNull(Objects.java:208) ~[?:?]
	at io.camunda.zeebe.broker.transport.commandapi.CommandResponseWriterImpl.tryWriteResponse(CommandResponseWriterImpl.java:99) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.TypedResponseWriterImpl.flush(TypedResponseWriterImpl.java:114) ~[classes/:?]
	at io.camunda.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[classes/:?]
	at io.camunda.zeebe.util.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:44) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:79) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:44) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:103) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:83) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:195) ~[classes/:?]

Also, I think that it works without retryBackoff because retries do so fast so in the end we quickly activate jobs and flush them.

@saig0
Copy link
Member

saig0 commented May 17, 2022

@aivinog1 I see. Thank you for sharing 👍

I was able to fix the behavior. See my commit here: a7195f7

zeebe-bors-camunda bot added a commit that referenced this issue May 20, 2022
9389: Implement a job backoff timeout on Zeebe Java client side r=saig0 a=aivinog1

## Description

<!-- Please explain the changes you made here. -->
I implemented the usage of the backoff parameter in the Zeebe Java Client and `@saig0` fixed the problem with flushing responses.

## Related issues

<!-- Which issues are closed by this PR or are related -->

#5629 



Co-authored-by: Alexey Vinogradov <vinogradov.a.i.93@gmail.com>
Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Jun 1, 2022
9417: feat(client-go): add support for backoff timeout for failed jobs in the Go client and zbctl r=pihme a=aivinog1

## Description

<!-- Please explain the changes you made here. -->
I've added mapping in the Go client and an additional flag that matches job backoff.

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #5629 



Co-authored-by: Alexey Vinogradov <vinogradov.a.i.93@gmail.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* fix(test): fix large decision input/output test

Since 8.3.0 Zeebe reacts differently on JSON passed to DMN instead of String. Fixing payload to be string in our test.

* chore(tests): fix payload file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/clients-go Marks an issue or PR to appear in the Go client section of the changelog scope/clients-java Marks an issue or PR to appear in the Java client section of the changelog version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
7 participants