Skip to content

Error Messages And What To Do About Them

krishna edited this page Jun 23, 2020 · 8 revisions

Overview

This page lists error messages and possible remedies. It is assumed that standard issues have been checked and resolved, such as:

  • lack of disk space
  • running out of memory or swap

failed to construct client connection: ssh: handshake failed: remote host public key mismatch

Overview

The workers connect to the master using an ssh-like protocol. This is authenticated using private/public keys. The worker knows in advance what the master's public key is, so as to prevent a MITM attack. The master has a list of workers' public keys.

Cause

This is seen on the concourse worker when the tsa_host_key.pub file on the worker, which contains the SSH public key, is a mismatch for the tsa_host_key file on the concourse master which contains the SSH private key.

Check this by doing "ssh-keygen -elf tsa_host_key" on the master, and compare with "ssh-keygen -elf tsa_host_key.pub" on the agent. Also, try running "ssh -p 2222 -v cicd-master" on the agent and check what you see in the output for "debug1: Server host key:"

Resolution

Put the correct public key on the worker, or the correct private key on the master. If necessary, generate a new key pair: https://concourse-ci.org/concourse-generate-key.html

rootfs_linux.go preparing rootfs caused "permission denied"

The actual line might be this:

May 28 13:53:02 cicd-worker-5 concourse[31512]: {"timestamp":"1559051582.143739462","source":"guardian","message":"guardian.create.create-failed-cleaningup.start","log_level":1,"data":{"cause":"runc run: exit status 1: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"rootfs_linux.go:46: preparing rootfs caused \\\"permission denied\\\"\""\n","handle":"aaaaaaaaa-aaaa-aaaa-aaaa-94420d6db781","session":"45.3"}}

max containers reached

This is seen when the container count on a worker reaches max values(255).

To resolve do

You have to manually kill the worker and it's volume, remember this will fail the other builds that are utilizing the same worker. Optionally you can land-worker, and wait for the containers to finish the task assigned and get cleaned-up. Which will make space for the new containers.

unknown handle

This is seen when...

To resolve do...

unknown handle

This is seen when...

To resolve do...

unknown Process

This is seen when...

To resolve do...

ProcessNotFoundError

Backend error: Exit status: 404, message: {"Type":"ProcessNotFoundError","Message":"unknown process: task","Handle":"","ProcessID":"task","Binary":""}

This is seen when a worker restarts when pipeline is running.

To resolve do Check why your worker restarted, If deployed into k8s, kubectl describe pod pod-name will show you why the pod restarted. In my experience it is due to Probe failures which is due to OOM, so adding additional memory will help you solve this. I would recommend checking the resources allocated to your concourse Instance and the limits of underlying nodes on which concourse is running. There is no proper solution for OOM error, except to tune resource values to match your requirement.