Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Kubelet/Checkpointer not starting up after master node reboot #1576

Open
jonasclaes opened this issue Oct 27, 2021 · 1 comment
Open

Kubelet/Checkpointer not starting up after master node reboot #1576

jonasclaes opened this issue Oct 27, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@jonasclaes
Copy link

Description

When I reboot my master node (bare-metal), the kubelet and etcd containers start up. However, nothing happens afterwards. The API server does not start up, checkpointer does not start up.

Impact

Complete loss of management plane. Cannot access cluster.

Environment and steps to reproduce

  1. Set-up:
    Set up a cluster on Proxmox using the bare-metal guide on the Lokomotive docs and iPXE booting it. I'm running 1 master node, and 3 worker nodes. I'm running the stable version of Flatcar, however this issue also happens on the beta build. I'm running the 0.9.0 version of lokoctl. I'm running the following components on the cluster:
  • flatcar-linux-update-operator
  • openebs-operator
  • openebs-storage-class
  • metrics-server
  • cert-manager
  • contour
  • web-ui
  • httpbin
  1. Task:
    General cluster management, deploying of services, etc...
  2. Action(s):
    a. Let the flatcar-linux-update-operator update the master node OS / manually reboot the master node using sudo reboot
  3. Error:
    Kubelet and etcd start up.
    Kubelet produces the following error message, somewhere in the logs:
    Oct 27 09:11:57 socrates001 docker[424152]: E1027 09:11:57.209231 424127 file.go:187] "Could not process manifest file" err="invalid pod: [spec.volumes[3].projected.sources[0].serviceAccountToken: Forbidden: must not be specified when serviceAccountName is not set]" path="/etc/kubernetes/manifests/kube-system-pod-checkpointer-pr6qb.json"

Expected behavior

I expected checkpointer to properly start, and start up the kube API server as well as all the other required kubernetes components.

Additional information

My cluster is named socrates, cluster domain is cluster.local.
I have 1 master node named socrates001 and 3 worker nodes, named socrates002, socrates003, socrates004.

Log files directly taken from the master node of kubelet.service.
kubelet.log

If you need more information, please ask, I'll be happy to provide it.

@invidian invidian added the bug Something isn't working label Jan 4, 2022
@invidian
Copy link
Member

invidian commented Jan 4, 2022

Thanks for reporting. I definitely spotted this before, as it's a result of pod-checkpointer being incompatible with recent Kubernetes versions, but I forgot to report it, as I do not actively work on the project anymore.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants