Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality #407

Open
noncuro opened this issue Jun 9, 2023 · 1 comment

Comments

@noncuro
Copy link

noncuro commented Jun 9, 2023

Related to issue #402, we have some long-running tasks that may last for hours. Currently, if a worker encounters a failure, the task is only retried after the in_progress_key expires, which is based on the max_timeout - potentially many hours.

max_timeout = max(f.timeout_s or self.job_timeout_s for f in self.functions.values())

A huge enhancement would be to lower the default self.in_progress_timeout_s to a lower value, like 10 seconds. The worker could then periodically update the in_progress_key expirations on every heartbeat, increasing it by a few seconds each time. This could ensure that jobs are retried promptly if a worker fails, rather than waiting for a long timeout.

This would be incredibly helpful for handling worker failures on long-running tasks.

@JonasKs
Copy link
Sponsor Collaborator

JonasKs commented Jun 9, 2023

I agree. PR welcome, but we’d have to solve #405 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants