Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement Time to live (TTL) for jobs #479

Open
manast opened this issue Mar 30, 2017 · 8 comments
Open

[Feature] Implement Time to live (TTL) for jobs #479

manast opened this issue Mar 30, 2017 · 8 comments

Comments

@manast
Copy link
Member

manast commented Mar 30, 2017

When adding jobs to the queue, it should be possible to define a maximum TTL. If the job takes more time to be processed than the TTL, it would be automatically failed.
A few things to consider:

  • It may be useful for the job to mark that it is till working on the job properly, to get a TTL extension.
  • When a job is failed by force, the queue should notify the worker so that it may have a chance to make a graceful shutdown, otherwise we may end having a worker that is busy for ever.
@manast
Copy link
Member Author

manast commented Mar 31, 2017

Note that we have the timeout option currently, but does address the two issues mentioned above.

@manast manast changed the title New feature. Implement Time to live (TTL) for jobs Feature] Implement Time to live (TTL) for jobs Mar 31, 2017
@manast manast changed the title Feature] Implement Time to live (TTL) for jobs [Feature] Implement Time to live (TTL) for jobs Mar 31, 2017
@DevBrent
Copy link

DevBrent commented Apr 4, 2017

If there is any progress on notifying the worker from the queue for a graceful shutdown, I'd like to hear about that in #484 for graceful shutdown within stalled jobs.

@pvraj
Copy link

pvraj commented Apr 6, 2017

Hello,
Until this ticket is completed, is there a way you suggest I can prevent jobs that exceed the timeout from still being processed? I tried Bull because the TTL functionality was not working in Kue. Thank you!

@manast
Copy link
Member Author

manast commented Apr 6, 2017

Not really. In fact this TTL functionality could work really well after #488 is ready, since it will allow us to actually kill the process that has exceeded the TTL really forcing it to stop working. These are pretty high prioritized items so expect them to be released soon.

@adamreisnz
Copy link

adamreisnz commented Aug 3, 2021

Hello, it has been 4+ years since this ticket was opened. Is there any progress on its implementation?

#488 appears to be implemented, which was supposedly a prerequisite for this feature?

@sinasalek
Copy link

It's not the best solution but for now you can check the job status on every loop iteration in the process and abort if it status is set to failed. So for stopping a job, you can just set its status to failed.

@elucidsoft
Copy link

So I just added logic to handle this, all I did was add a timestamp to jobs, then in the workers before they do anything else, they check the timestamp. If its older than 120 seconds (in our case), they cancel it. Sounds like a stupid solution right? But the workers make short work of getting of all the stale jobs in a queue this way so they can begin working on valid jobs, it takes seconds to clear out thousands of stale jobs.

@adamreisnz
Copy link

adamreisnz commented Dec 20, 2021

@sinasalek

It's not the best solution but for now you can check the job status on every loop iteration in the process and abort if it status is set to failed. So for stopping a job, you can just set its status to failed.

If this is done externally (e.g. outside of the scope of the worker doing the job), say in a separate script, do you know if the worker will actually be stopped/abort the job, if the job has been set to failed by the external script?

If the worker keeps waiting for the job to finish (and you have a stalled job because the worker is stuck), this solution would not work for that case.

@elucidsoft

So I just added logic to handle this, all I did was add a timestamp to jobs, then in the workers before they do anything else, they check the timestamp

Is this to prevent jobs that have failed previously from being picked up if they are older than 120 seconds?
I assume this won't work for the scenario that a job runs for the first time and runs more than 120 seconds?

@manast is there any progress on an official TTL solution for the new version of BullMQ? Happy to bounce around implementation ideas if there is a design proposal that needs refinement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants