Graceful shutdown for stalled jobs ( lock renewal ) #484

DevBrent · 2017-04-02T18:03:50Z

First of all sorry for muddying up the issue tracker, I'll be sure to help close two tickets to make up for this.

Active & Unlocked -> Queued
I see the code for stalled jobs and time-renewed locks. I don't see the process for checking the queue for active and unlocked jobs to migrate back into inactive/queued for another job consumer to take up.

Do you know where this code is located so I can review the process exactly?

Disabling auto-lock renewal
If we wanted to manually trigger the lock renewal, would that just involve setting the built-in lock renewal timer to a higher value, or would that also influence the stalled job monitor in a negative way? I'd ideally like each job's own event loop to control the renewal so if one job were to slow enormously it could lose the lock and another consumer could start over.

Lock renewal failure detection/notification
There is a TODO in the lock renewal code for notifying the consumer. This one is pretty important to us because running two jobs at once could trigger API rate limiting. Is there a best practice currently to detect the lock renewal failure? Promise cancellation comes into play here, but that's outside of the scope of the question and I'm already imagining the mess of code for handling that.

manast · 2017-04-02T19:19:33Z

here comes the answers:
Active & Unlocked -> Queued
https://github.com/OptimalBits/bull/blob/master/lib/queue.js#L559
And the timer is started here: _this.startMoveUnlockedJobsToWait();
Disabling auto-lock renewal
There is no "public" way to do this, but you can hack around it by setting the LOCK_RENEW_TIME to infinite, and calling moveUnlockedJobsToWait manually.

Lock renewal failure detection/notification
Not sure I understand which notification you mean. There is an event 'stalled' that is emitted when a job has been detected as stalled, but I guess that is not the one you mean.

DevBrent · 2017-04-04T16:00:08Z

@manast my issue specifically regards the stalled job that is still running (just slowly). #308 has a similar issue, but theirs was resolved by simply not stalling meanwhile based on our legacy code quality I expect some of our jobs to stall, and I'd like to notice that and cancel execution within the job processing code.

I guess I could setup a listener on every worker monitoring for stalled jobs, but still how would I know which of the two jobs currently running is the stalled job?

Specifically either of these two locations are where I would expect my job processor to be able to know immediately when the job was unable to renew the lock so it could shut itself down.

bull/lib/queue.js

Line 684 in d5646a0

    
           // TODO: if we failed to re-acquire the lock while trying to renew, should we let the job

bull/lib/queue.js

Line 688 in d5646a0

console.error('Error renewing lock ' + err);

I don't see any "stalled" or "lock" parameters inside the Job object, but ideally I would need to know as soon as possible if I'm stalled and another worker has started up to try again. It looks like within the job processor I could call Job.takeLock, and check if the result value was false, null, or a caught error then exit processing of the job.

DevBrent · 2017-04-06T18:03:10Z

@manast #488 is enticing, but I could effectively stop my own processing with a simple notice or ability to check job.hasLock() intermittently to confirm I'm still ok to keep processing.

The reason would be I might want to gracefully shutdown e.g. delete temporary files and close database connections.

I don't necessarily need the memory segmentation and overhead of using IPC either. I like the option of running jobs as child processes though, in cases where I might be running unpredictable low quality code it would certainly give that code a better chance and isolate it's impacts.

shaunwarman · 2019-03-10T08:09:46Z

Bump, would also love to see a graceful shutdown in the PATTERNS section.

DevBrent mentioned this issue Apr 4, 2017

[Feature] Implement Time to live (TTL) for jobs #479

Open

DevBrent changed the title ~~[Question] Lock renewals~~ Graceful shutdown for stalled jobs ( lock renewal ) Apr 4, 2017

manast added the enhancement label Jun 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown for stalled jobs ( lock renewal ) #484

Graceful shutdown for stalled jobs ( lock renewal ) #484

DevBrent commented Apr 2, 2017

manast commented Apr 2, 2017

DevBrent commented Apr 4, 2017 •

edited

DevBrent commented Apr 6, 2017

shaunwarman commented Mar 10, 2019

Graceful shutdown for stalled jobs ( lock renewal ) #484

Graceful shutdown for stalled jobs ( lock renewal ) #484

Comments

DevBrent commented Apr 2, 2017

manast commented Apr 2, 2017

DevBrent commented Apr 4, 2017 • edited

DevBrent commented Apr 6, 2017

shaunwarman commented Mar 10, 2019

DevBrent commented Apr 4, 2017 •

edited