Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

param --enable-worker not working #4272

Open
4 tasks done
ondrej-m opened this issue Apr 11, 2024 · 11 comments
Open
4 tasks done

param --enable-worker not working #4272

ondrej-m opened this issue Apr 11, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@ondrej-m
Copy link

ondrej-m commented Apr 11, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Debian 12.5, docker image
Docker ce 26.0.0

Version

k0sproject/k0s:v1.29.3-k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "90abf23a7d335a1763ee8504fe9811be9517a882bd6eb8c38dad79fa3e2dceec" (from machine) (pass)
Total memory: 3.8 GiB (pass)
Disk space available for /var/lib/k0s: 1.2 GiB (warning: 1.8 GiB recommended)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-18-amd64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /bin/mount (pass)
  Executable in PATH: umount: /bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: no kernel config found (warning)
  CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning)
  CONFIG_NET: Networking support: no kernel config found (warning)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning)
  CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

No response

Steps to reproduce

  1. docker run -d --name k0s --hostname k0s --privileged -v /var/lib/k0s -p 6443:6443 --cgroupns=host docker.io/k0sproject/k0s:v1.29.3-k0s.0 -- k0s controller --enable-worker
  2. docker exec -it k0s k0s status
    Version: v1.29.3+k0s.0
    Process ID: 8
    Role: controller
    Workloads: true
    SingleNode: false
    Kube-api probing successful: true
    Kube-api probing last error:
  3. $ docker exec -it k0s k0s kubectl get nodes --show-labels
    NAME STATUS ROLES AGE VERSION LABELS
    k0s Ready control-plane 4m42s v1.29.3+k0s beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k0s,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node.k0sproject.io/role=control-plane

Expected behavior

  1. docker exec -it k0s k0s status
    Version: v1.29.3+k0s.0
    Process ID: 8
    Role: controller +worker
    Workloads: true
    SingleNode: false
    Kube-api probing successful: true
    Kube-api probing last error:

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

@ondrej-m ondrej-m added the bug Something isn't working label Apr 11, 2024
@twz123
Copy link
Member

twz123 commented Apr 12, 2024

You mean Role: controller? That's expected, as this is a controller node. The difference that --enable-worker makes that it also starts the worker components (mainly kubelet and containerd). You can see that as Workloads: true.

If you want to run a worker-only node (that needs to join an existing cluster using a join token), have a look at the worker subcommand.

@hztsm
Copy link

hztsm commented May 11, 2024

I also encountered the same problem. I asked to install a host, which is both a management node and a worker node。
Execute the following installation command:

k0s install controller --single --enable-worker --enable worker

When I run the application, the pod is always pending.

@twz123
Copy link
Member

twz123 commented May 13, 2024

@hztsm This seems like a separate problem. Would you mind to file another issue and provide logs?

@jiridanek
Copy link

jiridanek commented May 24, 2024

@hztsm Please provide the output of kubectl describe pod your-pod -n your-namespace, this should clarify why the pod is pending

Without the logs, it is only possible to guess at several possible common reasons.

Pod cannot be scheduled on tainted node

(this should not be your problem, but it was mine, so I'll just post the logs and resolution for this case)

The describe logs would look something like this (leaving out irrelevant parts)

Status:           Pending
Conditions:
  Type           Status
  PodScheduled   False 
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  9m55s  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  4m54s  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

This is because your node is tainted.

$ kubectl describe nodes
Name:               k0s
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k0s
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node.k0sproject.io/role=control-plane
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 24 May 2024 08:28:30 +0200
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false

Run this to remove the taint (notice the - at the end)

kubectl taint nodes --all node-role.kubernetes.io/master:NoSchedule-

Or start k0s next time with --single flag. Adding only --enable-worker will start with the taint in place.

@jiridanek
Copy link

AND, when I do --single, I get

 Events:
  Type     Reason                  Age               From               Message
  ----     ------                  ----              ----               -------
  Normal   Scheduled               86s               default-scheduler  Successfully assigned workspace-controller-system/workspace-controller-controller-manager-86576f98dc-w88sp to k0s
  Warning  FailedCreatePodSandBox  86s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d86e2136499d2b8b76bfbf34ef9f4ca3971bb7aa422948bb88833a2d28e15e46": plugin type="bridge" name="kubernetes" failed (add): no IP ranges specified
  Normal   SandboxChanged          3s (x7 over 86s)  kubelet            Pod sandbox changed, it will be killed and re-created.

so what works for me is --enable-worker and removing the taint with kubectl.

@twz123
Copy link
Member

twz123 commented May 24, 2024

AND, when I do --single, I get

 Events:
  Type     Reason                  Age               From               Message
  ----     ------                  ----              ----               -------
  Normal   Scheduled               86s               default-scheduler  Successfully assigned workspace-controller-system/workspace-controller-controller-manager-86576f98dc-w88sp to k0s
  Warning  FailedCreatePodSandBox  86s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d86e2136499d2b8b76bfbf34ef9f4ca3971bb7aa422948bb88833a2d28e15e46": plugin type="bridge" name="kubernetes" failed (add): no IP ranges specified
  Normal   SandboxChanged          3s (x7 over 86s)  kubelet            Pod sandbox changed, it will be killed and re-created.

so what works for me is --enable-worker and removing the taint with kubectl.

That is somewhat surprising. There shouldn't be any differences concerning CNI between --single and --enable-worker. Is that reproducible? Does it still work when you specify --enable-worker --disable-components=konnectivity-server?

@jiridanek
Copy link

Is that reproducible?

Yes, here's few GHA runs for the various scenarios

k0s controller --enable-worker

https://github.com/jiridanek/notebooks-v2/actions/runs/9223828595/job/25377867756

k0s controller --single

https://github.com/jiridanek/notebooks-v2/actions/runs/9223888724/job/25378050139

k0s controller --single --disable-components=konnectivity-server

k0s controller --enable-worker --disable-components=konnectivity-server

Does it still work when you specify --enable-worker --disable-components=konnectivity-server?

Yes, pod still runs (if I untaint). See above.

@jiridanek
Copy link

SingleNode: false

That's in the logs in the original issue report. Shouldn't this be correctly set to true?

@twz123
Copy link
Member

twz123 commented May 24, 2024

SingleNode: false

That's in the logs in the original issue report. Shouldn't this be correctly set to true?

In the original issue, k0s wasn't started with the --single flag, so why would one expect this to be true?

@twz123
Copy link
Member

twz123 commented May 24, 2024

I'm sooo oblivious 😬

That's why v1.30.0 is not starting with --single:

So, to use --single, you might want to wait until v1.30.1, which will ship in the next week, I think, or provide a custom config to v1.30.0 which changes the kube-router metrics port, or use v1.29.4.

@johbo
Copy link

johbo commented Jun 2, 2024

Just got here, using --enable-worker gives me a controller node with the taint, as described in #4272 (comment)

Based on the discussion above I think that this is the intended behavior, since --enable-worker means only that it will start the worker components, nothing beyond this.

Noticed that there is also the flag --no-taints which I should probably use as well to end up with a controller which will also run regular workloads.

Excerpt from the help of k0s controller --help:

      --enable-worker                                  enable worker (default false)
      --no-taints                                      disable default taints for controller node

Think it could help to tweak the help text of --enable-worker so that it is explicit about only enabling the worker components (think kubelet and containerd) and not about the taints.

johbo added a commit to johbo/k0s-nix that referenced this issue Jun 2, 2024
Realized that no Pods were scheduled due to taints on the Node object. Just
using "--enable-worker" is not enough.

See: k0sproject/k0s#4272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants