param --enable-worker not working #4272

ondrej-m · 2024-04-11T12:20:41Z

Before creating an issue, make sure you've checked the following:

You are running the latest released version of k0s
Make sure you've searched for existing issues, both open and closed
Make sure you've searched for PRs too, a fix might've been merged already
You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Debian 12.5, docker image
Docker ce 26.0.0

Version

k0sproject/k0s:v1.29.3-k0s.0

Sysinfo

`k0s sysinfo`

Machine ID: "90abf23a7d335a1763ee8504fe9811be9517a882bd6eb8c38dad79fa3e2dceec" (from machine) (pass)
Total memory: 3.8 GiB (pass)
Disk space available for /var/lib/k0s: 1.2 GiB (warning: 1.8 GiB recommended)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-18-amd64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /bin/mount (pass)
  Executable in PATH: umount: /bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: no kernel config found (warning)
  CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning)
  CONFIG_NET: Networking support: no kernel config found (warning)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning)
  CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

No response

Steps to reproduce

docker run -d --name k0s --hostname k0s --privileged -v /var/lib/k0s -p 6443:6443 --cgroupns=host docker.io/k0sproject/k0s:v1.29.3-k0s.0 -- k0s controller --enable-worker
docker exec -it k0s k0s status
Version: v1.29.3+k0s.0
Process ID: 8
Role: controller
Workloads: true
SingleNode: false
Kube-api probing successful: true
Kube-api probing last error:
$ docker exec -it k0s k0s kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k0s Ready control-plane 4m42s v1.29.3+k0s beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k0s,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node.k0sproject.io/role=control-plane

Expected behavior

docker exec -it k0s k0s status
Version: v1.29.3+k0s.0
Process ID: 8
Role: controller +worker
Workloads: true
SingleNode: false
Kube-api probing successful: true
Kube-api probing last error:

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

twz123 · 2024-04-12T12:07:44Z

You mean Role: controller? That's expected, as this is a controller node. The difference that --enable-worker makes that it also starts the worker components (mainly kubelet and containerd). You can see that as Workloads: true.

If you want to run a worker-only node (that needs to join an existing cluster using a join token), have a look at the worker subcommand.

hztsm · 2024-05-11T07:16:35Z

I also encountered the same problem. I asked to install a host, which is both a management node and a worker node。
Execute the following installation command:

k0s install controller --single --enable-worker --enable worker

When I run the application, the pod is always pending.

twz123 · 2024-05-13T11:47:00Z

@hztsm This seems like a separate problem. Would you mind to file another issue and provide logs?

jiridanek · 2024-05-24T06:47:07Z

@hztsm Please provide the output of kubectl describe pod your-pod -n your-namespace, this should clarify why the pod is pending

Without the logs, it is only possible to guess at several possible common reasons.

Pod cannot be scheduled on tainted node

(this should not be your problem, but it was mine, so I'll just post the logs and resolution for this case)

The describe logs would look something like this (leaving out irrelevant parts)

Status:           Pending
Conditions:
  Type           Status
  PodScheduled   False 
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  9m55s  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  4m54s  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

This is because your node is tainted.

$ kubectl describe nodes
Name:               k0s
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k0s
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node.k0sproject.io/role=control-plane
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 24 May 2024 08:28:30 +0200
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false

Run this to remove the taint (notice the - at the end)

kubectl taint nodes --all node-role.kubernetes.io/master:NoSchedule-

Or start k0s next time with --single flag. Adding only --enable-worker will start with the taint in place.

jiridanek · 2024-05-24T07:22:58Z

AND, when I do --single, I get

 Events:
  Type     Reason                  Age               From               Message
  ----     ------                  ----              ----               -------
  Normal   Scheduled               86s               default-scheduler  Successfully assigned workspace-controller-system/workspace-controller-controller-manager-86576f98dc-w88sp to k0s
  Warning  FailedCreatePodSandBox  86s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d86e2136499d2b8b76bfbf34ef9f4ca3971bb7aa422948bb88833a2d28e15e46": plugin type="bridge" name="kubernetes" failed (add): no IP ranges specified
  Normal   SandboxChanged          3s (x7 over 86s)  kubelet            Pod sandbox changed, it will be killed and re-created.

so what works for me is --enable-worker and removing the taint with kubectl.

twz123 · 2024-05-24T12:03:24Z

AND, when I do --single, I get

 Events:
  Type     Reason                  Age               From               Message
  ----     ------                  ----              ----               -------
  Normal   Scheduled               86s               default-scheduler  Successfully assigned workspace-controller-system/workspace-controller-controller-manager-86576f98dc-w88sp to k0s
  Warning  FailedCreatePodSandBox  86s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d86e2136499d2b8b76bfbf34ef9f4ca3971bb7aa422948bb88833a2d28e15e46": plugin type="bridge" name="kubernetes" failed (add): no IP ranges specified
  Normal   SandboxChanged          3s (x7 over 86s)  kubelet            Pod sandbox changed, it will be killed and re-created.

so what works for me is --enable-worker and removing the taint with kubectl.

That is somewhat surprising. There shouldn't be any differences concerning CNI between --single and --enable-worker. Is that reproducible? Does it still work when you specify --enable-worker --disable-components=konnectivity-server?

jiridanek · 2024-05-24T12:34:56Z

Is that reproducible?

Yes, here's few GHA runs for the various scenarios

`k0s controller --enable-worker`

https://github.com/jiridanek/notebooks-v2/actions/runs/9223828595/job/25377867756

Node is tainted
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223828595/job/25377867756#step:7:16
Pod runs (after untainting)
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223828595/job/25377867756#step:9:258

`k0s controller --single`

https://github.com/jiridanek/notebooks-v2/actions/runs/9223888724/job/25378050139

Node is not tainted
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223888724/job/25378050139#step:7:16
Pod does not run, complains about failure to setup network
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223888724/job/25378050139#step:9:304

`k0s controller --single --disable-components=konnectivity-server`

Node is not tainted
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223934497/job/25378194455#step:7:17
Pod does not run, same setup network failure
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223934497/job/25378194455#step:9:538

`k0s controller --enable-worker --disable-components=konnectivity-server`

Node is tainted
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223961668/job/25378281082#step:7:16
Pod does run
- https://github.com/jiridanek/notebooks-v2/actions/runs/9223961668/job/25378281082#step:9:258

Does it still work when you specify --enable-worker --disable-components=konnectivity-server?

Yes, pod still runs (if I untaint). See above.

jiridanek · 2024-05-24T13:04:58Z

SingleNode: false

That's in the logs in the original issue report. Shouldn't this be correctly set to true?

twz123 · 2024-05-24T14:47:14Z

SingleNode: false

That's in the logs in the original issue report. Shouldn't this be correctly set to true?

In the original issue, k0s wasn't started with the --single flag, so why would one expect this to be true?

twz123 · 2024-05-24T15:42:41Z

I'm sooo oblivious 😬

That's why v1.30.0 is not starting with --single:

kube-router failed to start when installation with default settings #4411

So, to use --single, you might want to wait until v1.30.1, which will ship in the next week, I think, or provide a custom config to v1.30.0 which changes the kube-router metrics port, or use v1.29.4.

johbo · 2024-06-02T19:47:26Z

Just got here, using --enable-worker gives me a controller node with the taint, as described in #4272 (comment)

Based on the discussion above I think that this is the intended behavior, since --enable-worker means only that it will start the worker components, nothing beyond this.

Noticed that there is also the flag --no-taints which I should probably use as well to end up with a controller which will also run regular workloads.

Excerpt from the help of k0s controller --help:

      --enable-worker                                  enable worker (default false)
      --no-taints                                      disable default taints for controller node

Think it could help to tweak the help text of --enable-worker so that it is explicit about only enabling the worker components (think kubelet and containerd) and not about the taints.

Realized that no Pods were scheduled due to taints on the Node object. Just using "--enable-worker" is not enough. See: k0sproject/k0s#4272

ondrej-m added the bug Something isn't working label Apr 11, 2024

jiridanek mentioned this issue May 24, 2024

ci(ws): add integration test that deploys workspace-controller onto k0s kubeflow/notebooks#10

Open

johbo added a commit to johbo/k0s-nix that referenced this issue Jun 2, 2024

fix: Add "--no-taints" for "controller+worker" role

6ec2f93

Realized that no Pods were scheduled due to taints on the Node object. Just using "--enable-worker" is not enough. See: k0sproject/k0s#4272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

param --enable-worker not working #4272

param --enable-worker not working #4272

ondrej-m commented Apr 11, 2024 •

edited

twz123 commented Apr 12, 2024

hztsm commented May 11, 2024 •

edited

twz123 commented May 13, 2024

jiridanek commented May 24, 2024 •

edited

jiridanek commented May 24, 2024

twz123 commented May 24, 2024

jiridanek commented May 24, 2024

jiridanek commented May 24, 2024

twz123 commented May 24, 2024

twz123 commented May 24, 2024 •

edited

johbo commented Jun 2, 2024

param --enable-worker not working #4272

param --enable-worker not working #4272

Comments

ondrej-m commented Apr 11, 2024 • edited

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

twz123 commented Apr 12, 2024

hztsm commented May 11, 2024 • edited

twz123 commented May 13, 2024

jiridanek commented May 24, 2024 • edited

Pod cannot be scheduled on tainted node

jiridanek commented May 24, 2024

twz123 commented May 24, 2024

jiridanek commented May 24, 2024

k0s controller --enable-worker

k0s controller --single

k0s controller --single --disable-components=konnectivity-server

k0s controller --enable-worker --disable-components=konnectivity-server

jiridanek commented May 24, 2024

twz123 commented May 24, 2024

twz123 commented May 24, 2024 • edited

johbo commented Jun 2, 2024

ondrej-m commented Apr 11, 2024 •

edited

hztsm commented May 11, 2024 •

edited

jiridanek commented May 24, 2024 •

edited

`k0s controller --enable-worker`

`k0s controller --single`

`k0s controller --single --disable-components=konnectivity-server`

`k0s controller --enable-worker --disable-components=konnectivity-server`

twz123 commented May 24, 2024 •

edited