Better cooperation with Kubernetes zero-downtime restarts #2532

FooBarWidget · 2024-03-16T09:21:50Z

Problem statement

Upon shutdown, we stop listening for new requests while allowing existing requests to finish. The first part, "stop listening for new requests", does not play well with the way Kubernetes handles rolling upgrades.

During a Kubernetes Deployment rolling upgrade, Kubernetes performs two actions concurrently:

Initiating removal of the Pod from the load balancer.
Sending SIGTERM to the container.

Problem: removal of the Pod from the load balancer can take an arbitrary amount of time. Until it's done, we should continue to listen for new requests. Otherwise, requests may still be routed to the (terminating) Pod, resulting in HTTP errors.

See also: Delaying Shutdown to Wait for Pod Deletion Propagation

Existing solutions

A common existing solution is to introduce a pre-stop hook that sleeps for a short amount of time. The termination flow then becomes like this:

Concurrently:

Initiating removal of the Pod from the load balancer.
Sequentially:
1. Run pre-stop hook and wait until it finishes.
2. Sending SIGTERM to the container.

The hope is that the pre-stop hook waits long enough for removal-from-load-balancer to finish.

Problems with this approach:

It's not guaranteed that by the time the pre-stop hook finishes, the Pod is actually removed from the load balancer (sleeping time too short).
Conversely, the Pod may be removed from the load balancer well before the pre-stop hook finishes (sleeping time too long). This wastes cluster resources.

Proposed solution

Enterprise-only feature.

Modify the shutdown behavior as follows. There are two parts that run sequentially:

Part 1 (optional): Wait until Pod actually removed from load balancer

Keep serving requests as normal until we're actually removed from the load balancer.

This can probably be checked by query the Kubernetes API and wait until the corresponding EndpointSlice to be either gone, or at least doesn't reference our Pod anymore. Needs more investigation.

Part 2: Incoming-request-based shutdown delay

Even when we're removed from the load balancer, there may still be in-flight requests from two places:

The kernel socket backlog may still have new connections that we haven't accept()ed yet.
Already accept()ed sockets may still have unread requests due to HTTP pipelining.

To deal with the kernel socket backlog: don't immediately stop accepting new socket connections. Instead, only do so some time (value configurable) after no requests have appeared (on either new or existing sockets).

We don't need to worry about unread pipelined requests. According to the HTTP 1.1 spec, "Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses".

FooBarWidget added the Enhancement label Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better cooperation with Kubernetes zero-downtime restarts #2532

Better cooperation with Kubernetes zero-downtime restarts #2532

FooBarWidget commented Mar 16, 2024

Better cooperation with Kubernetes zero-downtime restarts #2532

Better cooperation with Kubernetes zero-downtime restarts #2532

Comments

FooBarWidget commented Mar 16, 2024

Problem statement

Existing solutions

Proposed solution

Part 1 (optional): Wait until Pod actually removed from load balancer

Part 2: Incoming-request-based shutdown delay