Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better cooperation with Kubernetes zero-downtime restarts #2532

Open
FooBarWidget opened this issue Mar 16, 2024 · 0 comments
Open

Better cooperation with Kubernetes zero-downtime restarts #2532

FooBarWidget opened this issue Mar 16, 2024 · 0 comments

Comments

@FooBarWidget
Copy link
Member

Problem statement

Upon shutdown, we stop listening for new requests while allowing existing requests to finish. The first part, "stop listening for new requests", does not play well with the way Kubernetes handles rolling upgrades.

During a Kubernetes Deployment rolling upgrade, Kubernetes performs two actions concurrently:

  1. Initiating removal of the Pod from the load balancer.
  2. Sending SIGTERM to the container.

Problem: removal of the Pod from the load balancer can take an arbitrary amount of time. Until it's done, we should continue to listen for new requests. Otherwise, requests may still be routed to the (terminating) Pod, resulting in HTTP errors.

See also: Delaying Shutdown to Wait for Pod Deletion Propagation

Existing solutions

A common existing solution is to introduce a pre-stop hook that sleeps for a short amount of time. The termination flow then becomes like this:

Concurrently:

  • Initiating removal of the Pod from the load balancer.
  • Sequentially:
    1. Run pre-stop hook and wait until it finishes.
    2. Sending SIGTERM to the container.

The hope is that the pre-stop hook waits long enough for removal-from-load-balancer to finish.

Problems with this approach:

  • It's not guaranteed that by the time the pre-stop hook finishes, the Pod is actually removed from the load balancer (sleeping time too short).
  • Conversely, the Pod may be removed from the load balancer well before the pre-stop hook finishes (sleeping time too long). This wastes cluster resources.

Proposed solution

Enterprise-only feature.

Modify the shutdown behavior as follows. There are two parts that run sequentially:

Part 1 (optional): Wait until Pod actually removed from load balancer

Keep serving requests as normal until we're actually removed from the load balancer.

This can probably be checked by query the Kubernetes API and wait until the corresponding EndpointSlice to be either gone, or at least doesn't reference our Pod anymore. Needs more investigation.

Part 2: Incoming-request-based shutdown delay

Even when we're removed from the load balancer, there may still be in-flight requests from two places:

  • The kernel socket backlog may still have new connections that we haven't accept()ed yet.
  • Already accept()ed sockets may still have unread requests due to HTTP pipelining.

To deal with the kernel socket backlog: don't immediately stop accepting new socket connections. Instead, only do so some time (value configurable) after no requests have appeared (on either new or existing sockets).

We don't need to worry about unread pipelined requests. According to the HTTP 1.1 spec, "Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant