Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

noncuro · 2023-06-09T18:29:45Z

Related to issue #402, we have some long-running tasks that may last for hours. Currently, if a worker encounters a failure, the task is only retried after the in_progress_key expires, which is based on the max_timeout - potentially many hours.

arq/arq/worker.py

Line 264 in 9109c2e

    
           max_timeout = max(f.timeout_s or self.job_timeout_s for f in self.functions.values())

A huge enhancement would be to lower the default self.in_progress_timeout_s to a lower value, like 10 seconds. The worker could then periodically update the in_progress_key expirations on every heartbeat, increasing it by a few seconds each time. This could ensure that jobs are retried promptly if a worker fails, rather than waiting for a long timeout.

This would be incredibly helpful for handling worker failures on long-running tasks.

The text was updated successfully, but these errors were encountered:

JonasKs · 2023-06-09T19:07:26Z

I agree. PR welcome, but we’d have to solve #405 first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

noncuro commented Jun 9, 2023

JonasKs commented Jun 9, 2023

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

Comments

noncuro commented Jun 9, 2023

JonasKs commented Jun 9, 2023