Skip to content

trusch/deadman-switch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deadman-switch

This tool implements a deadman switch for software systems. It expects to regularly receive pings from its configured services and one or more webhooks will be called when it doesn't hear from a service for a configurable amount of time.

Features

  • alert you when your services are down
  • alert you when your services up again
  • notifications can be send to any webhook or to slack
    • use custom URL, headers, body for webhooks
    • use custom key/value pairs on the slack message
  • configurable message debouncing
  • dynamic configuration of services and notifications via HTTP API
    • secured with basic auth
  • scalable in both directions
    • from a small container with <32MB RAM
    • to a cluster that can handle thousands of pings and notifications per second
  • leader election in the cluster, so only one node checks deadlines and triggers notifications
  • notifications are queued, so they can be executed by the whole cluster
  • optionally supply a secret token when configuring your services, so the ping messages can't be spoofed easily

Quickstart

Up and running in less than 1 minute:

# start deadman-switch
docker run --name deadman-switch -d --rm -p 8080:8080 trusch/deadman-switch:latest

# configure service
curl -u admin:admin -XPOST --data-binary @- localhost:8080/config <<EOF
{
  "id": "service-1",
  "timeout": "30s",
  "debounce": "1m",
  "alertNotifications": [
	{
	  "type": "webhook",
	  "config": {
	    "method": "GET",
		"url": "http://localhost:8080/log?service-1-alert"
	  }
	}
  ]
}
EOF

# call the ping endpoint
curl http://localhost:8080/ping/service-1

# look at the logs
docker logs -f deadman-switch

Build and run

Dependencies

This repo requires podman and buildah as development toolset.

Ubuntu install commands:

. /etc/os-release
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install podman buildah skopeo
echo "${USER}:100000:65535" | sudo tee -a /etc/subuid
echo "${USER}:100000:65535" | sudo tee -a /etc/subgid

Build image

make image

Run a local test deployment

make run

This will bring up a pod with etcd as storage backend, caddy as ingress router and two instances of deadman-switch. The pod will expose port 8080 to serve our HTTP API. You can now for example list all configured services like this:

curl -u admin:admin http://localhost:8080/config | jq .

You can also POST or DELETE service config objects using this endpoint:

curl -XPOST -u admin:admin -d '{"id":"new_service", "timeout":"10s", "notifications":[{"webhook": {"url": "https://google.com", "method": "GET"}}]}' http://localhost:8080/config
curl -XDELETE -u admin:admin http://localhost:8080/config/new_service

To actually send a ping to the deadman switch do something like this:

curl http://localhost:8080/ping/svc1?token=secret1

If you don't do anything, the application will start calling its configured webhooks after 30 seconds. You can see that in the logs: podman logs -fn deadman-switch-1 deadman-switch-2. Please note that only one of the two nodes checks the deadlines, but both nodes are used to send out the actual notification webhooks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published