restart_policy max_attempts seems backwards. #45039

chrisbecke · 2023-02-20T09:08:17Z

chrisbecke
Feb 20, 2023

The guidance in Docker Bench is that restart_policy.max_attempts should be set to a small number - such as 5.
This just causes problems in production swarms where long running tasks might be restarted because they fail health checks or, even worse, are migrated because their node is restarted as part of routine maintenance.
This means each task now has a hidden counter that, after weeks or months even, will prevent a task being restarted.

It is difficult to imagine where this behaviour is desired. The only way to avoid this is to set the max_attempts to 0 - which allows infinite restarts. Which is not desirable as it does not detect and stop a service that is being restarted in a fail loop due to configuration drift or some other infrastructure error.

There is a window parameter, but rather than counting errors within a window to determine if a service is stuck in a fail loop, the window allows for restarts to not be counted within the window.

Clearly someone thought this makes sense, but as someone who runs swarm in production I don't get it. restart_policy.max_attempts currently does NOT catch and stop services that have entered an error state, so much as stop running tasks after months of operation.

What gives?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restart_policy max_attempts seems backwards. #45039

{{title}}

Replies: 0 comments

Select a reply

restart_policy max_attempts seems backwards. #45039

chrisbecke Feb 20, 2023

Replies: 0 comments

chrisbecke
Feb 20, 2023