Blog: Types of Probes in Kubernetes (#206)

* Draft * Fix #204 * Image * Edits * Edits * Cross-posting
emmercm · Feb 20, 2022 · d5f1d3b · d5f1d3b
1 parent c88727f
commit d5f1d3b
Show file tree

Hide file tree

Showing 4 changed files with 177 additions and 4 deletions.
diff --git a/index.js b/index.js
@@ -665,10 +665,10 @@ tracer(Metalsmith(__dirname))
                 image: `${metalsmith.metadata().gravatar.main}?s=512`,
                 url: siteURL,
                 sameAs: [
-                    metalsmith.metadata().github.profile.user.html_url,
+                    metalsmith.metadata().github ? metalsmith.metadata().github.profile.user.html_url : null,
                     'https://twitter.com/emmercm',
                     'https://www.linkedin.com/in/emmercm/'
-                ]
+                ].filter(url=>url)
             }
         ],
         collections: {

diff --git a/src/blog/types-of-probes-in-kubernetes.md b/src/blog/types-of-probes-in-kubernetes.md
@@ -0,0 +1,58 @@
+---
+
+title: Types of Probes in Kubernetes
+date: 2022-02-20T04:50:00
+tags:
+- kubernetes
+- microservices
+
+---
+
+It's tempting to use the same health check endpoint for multiple probes in Kubernetes, but the kubelet uses each probe for very different purposes.
+
+The Kubernetes [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) employs three different types of probes to help manage containers. Probes are typically RPC requests to a health check endpoint but can also be a command executed inside a container.
+
+Probes are used to help detect and respond to:
+
+- Containers that haven't started yet and can't successfully serve traffic
+- Containers that are overwhelmed and can't successfully serve additional traffic
+- Containers that are completely dead and aren't serving any traffic or processing any messages
+
+## The liveness probe
+
+The **liveness** probe is used to catch deadlocks or pods that have stopped processing - situations where restarting the pod would solve the problem. In these situations the pod is determined to be dead, and it will not revive no matter how long you wait. It is somewhat expected that with enough time that long-running services will enter a broken state, and this probe is designed to fix that.
+
+This is primarily what I talked about in "[Writing Meaningful Health Check Endpoints](/blog/writing-meaningful-health-check-endpoints/)" and is likely what most people think of in respect to service orchestrator probes.
+
+Without a liveness probe defined, Kubernetes will take an action based on the pod's restart policy when a container's PID 1 stops, generally restarting the container.
+
+## The readiness probe
+
+The **readiness** probe is used to know if a container is ready to serve traffic. This applies to newly started containers that aren't ready to serve traffic yet as well as existing containers that are overwhelmed and can't handle additional traffic.
+
+This may be extremely non-obvious from the plain English definition of the word "ready", but the readiness probe is run throughout the entire lifecycle of the container. That means that containers can go in and out of "ready" which will add and remove them from the load balancer, respectively. This is easy to confuse with startup probes which we will discuss next.
+
+You don't always need a liveness probe, a server may be simple enough that crashes always result in the PID 1 exiting, but almost _every_ service serving traffic should have a readiness probe.
+
+Without a readiness probe defined, Kubernetes will route traffic to containers as soon as its PID 1 has started.
+
+_Note: a pod is considered "ready" when all of its containers are ready._
+
+_Note: the liveness probe doesn't wait for the readiness probe to succeed, so you may want to configure a delay or a startup probe._
+
+## The startup probe
+
+The **startup** probe is used to know when a container has started. If a startup probe is configured, the liveness and readiness checks are disabled until the startup check succeeds _once_, so that those probes won't interfere with the application starting. Startup probes are primarily used for slow-starting containers so the kubelet won't kill them because of a failing liveness check.
+
+Startup probes graduated from beta to stable with [Kubernetes v1.20](https://kubernetes.io/blog/2020/12/08/kubernetes-1-20-release-announcement/) in December 2020, so you may find older guides on the internet using readiness probes for the startup check, but you shouldn't anymore.
+
+Without a startup probe defined, Kubernetes will start running the liveness and readiness probes as soon as the container's PID 1 has started.
+
+## Cautions
+
+Here are a few cautions on probes:
+
+- **Availability**: if you design your readiness probe such that it's possible for every container to fail it at the same time (e.g. depending on connectivity to a flaky downstream service), then it's possible to have zero pods in your load balancer
+- **Overwhelming**: be careful that your readiness and liveness probes aren't run so often that they overwhelm a container
+- **Timeouts**: the default probe timeout is 1 second, so make sure your health checks are fast, or you will get false negatives
+- **False positives**: make sure you use the same web server for your health check endpoints that you do for your production traffic, using different servers may give false positives that the service is working
diff --git a/src/blog/writing-meaningful-health-check-endpoints.md b/src/blog/writing-meaningful-health-check-endpoints.md
@@ -24,7 +24,7 @@ Service orchestrators and monitors (e.g. the Kubernetes [kubelet](https://kubern
 
 Polling a health check endpoint is a form of _black box monitoring_ where the requester has no visibility into the internals of the service, it only knows if it received a success or error response.
 
-_For the purpose of this article we'll treat service liveness and readiness as the same thing, but know that they are separate and may require different health check endpoints._
+_For the purpose of this article we'll treat liveness and readiness probes as the same thing, but see "[Types of Probes in Kubernetes](/blog/types-of-probes-in-kubernetes)" for an explanation of the differences and why you may want separate health check endpoints for each._
 
 ## Having no health check is bad
 
@@ -96,8 +96,9 @@ Here's an example response you could use:
 
 Here are some deeper considerations when designing your health check endpoints:
 
+- **Availability**: if you have your health check depend on a database, it could have similar availability problems that depending on external services have (laid out above), your service orchestrator could determine all instances to be unhealthy and then there won't be any to handle any traffic
 - **Authentication**: if your service requires authentication you may need to exclude the health check endpoint so your service orchestrator can work.
-- **Availability**: you likely don't want to expose your internal state via publicly available health check endpoints unless you rely on a synthetics SaaS vendor, and if you do then you probably want authentication.
+- **Security**: you likely don't want to expose your internal state via publicly available health check endpoints unless you rely on a synthetics SaaS vendor, and if you do then you probably want authentication.
 - **Throughput**: the health check may be called very frequently so service instances can fail and recover fast, you may need to consider this.
 - **Response time**: you'll want to be mindful that the health check doesn't take too long to calculate and respond, service orchestrators may have a request timeout such as Kubernetes' default of 1 second.
 - **Caching**: it could be expensive to calculate a holistic status of a service, you could potentially cache some parts of the health check in-memory.