k3s: Service starts up before network; and doesn't shutdown properly #103158

ThinkChaos · 2020-11-08T18:53:12Z

Describe the bug

systemd.services.k3s doesn't depend on network-online.target, which introduces a race and causes spurious k3s.service errors.
Stopping the service doesn't actually stop the containers, so when shutting the system down we get systemd-shutdown[1]: Waiting for: containerd-shim. This causes the shutdown time to go through the roof.

Additional context
Kind of related to #98090 because k3s-killall.sh could be useful for 2.

Notify maintainers
@euank

Nonetheless, thanks for maintaining this!

The text was updated successfully, but these errors were encountered:

freezeboy · 2020-11-08T20:06:21Z

For number 2, would not a KillMode=control-group be enough in the service unit?

ThinkChaos · 2020-11-09T18:25:12Z

systemd.service.k3s was most likely copied from rancher/k3s/k3s.service and because control-group is the default (according to docs) I believe it's there for a reason.
I still gave it a shot: defined my own k3s_custom service using control-group. Instead of systemd-shutdown complaining about containerd-shim this gives us the standard systemd 1m30s timeout countdown.

I'll make a PR with the network part for 1 as that does work.

euank · 2020-11-09T18:33:08Z

Yup, it was copied from k3s.service.

I think the reason it's there is that often, if you just do something like systemctl restart k3s.service, it's desirable for the containers to keep running since k8s can "re-adopt" those orphaned containers.

It does sound like it interacts poorly with shutdowns, but for things like k3s upgrades, it seems desirable.
I'm not sure the right solution there.

For the networking thing, I definitely think we want an after=network-online.target. Probably a simple miss on my part.

I use the module with k3s.docker = true, which sets after=docker.service, which as a side-effect orders it after network.target, which is why I think I haven't been impacted by that yet.

One other thing that sorta falls under this issue: k3s should also be ordered after firewalld.service on nixos to ensure the kube-proxy iptables rules don't race with the nixos fw rules on bootup. This is something I realized a few days ago, and hadn't gotten around to fixing.

ThinkChaos · 2020-11-09T21:28:14Z

It does sound like it interacts poorly with shutdowns, but for things like k3s upgrades, it seems desirable.

I agree, I noticed when I reverted back to process just now, nixos-rebuild switch waited for the containers to stop.

k3s should also be ordered after firewalld.service

I added that to the PR. I only saw your comment after submitting it.

I'm not sure the right solution there.

How about adding k3s-killall.sh and using it in ExecStopPost (ExecStop is called for restarts too)?
It's not perfect because this means one more file to keep in sync with upstream, but I don't see a better solution.

jnetod · 2021-06-03T21:49:40Z

2 is affecting docker too on 21.05.

containerd/containerd#5502

wamserma · 2021-07-12T21:17:55Z

2 is affecting docker too on 21.05.

containerd/containerd#5502

Yes, this is pretty annoying. I wish that issue would get more attention (by the containerd folks).

codygman · 2021-11-05T16:47:21Z

2 is affecting docker too on 21.05.

containerd/containerd#5502

Came here to link this, it costs 20-30s per reboot 😔

rgoulter · 2021-11-05T17:25:14Z

I ran into the "shutdown blocked on containerd-shim" message by way of forgetting about containers I'd previously run.

I had some a container which I'd run and then forgot about. IIRC, even having stopped containers means that containerd-shim will time out on shutdown/reboot.

In my case, I didn't need to keep the container around between reboots. So, I was able to work around the timeout by just removing the containers. -- But if you've forgotten about it, you wouldn't know to look.

Baughn · 2021-12-07T13:22:46Z

I ran into the "shutdown blocked on containerd-shim" message by way of forgetting about containers I'd previously run.

I ran into it by having a server I intended to reboot get stuck, waiting for containerd-shim forever. It did not time out. This happened after the network had been torn down, so there was no way to fix it until I got back from vacation. RIP all my planning.

Destroying containers prior to shutdown isn't a good fix. The user might forget, or -- quite possibly -- might still want those containers to exist after the reboot.

This arguably is indeed a bug in the docker service, but also in however reboots are configured. Nothing should be able to block a reboot indefinitely.

mohe2015 · 2021-12-07T13:29:08Z

I assume containerd/containerd#5828 will fix this?

wamserma · 2021-12-07T15:36:20Z

I'm wondering if this gets merged to containerd before we release 22.05. ;)

dustinlacewell · 2021-12-17T22:46:23Z

I'm wondering if this gets merged to containerd before we release 22.05. ;)

The second workaround in this SO answer is just a systemd unit so I wonder if we could integrate that if they do not get this merged by then.

https://unix.stackexchange.com/questions/666963/docker-20-10-x-keeps-system-waiting-for-several-minutes-before-shutdown-or-reboo

wamserma · 2022-01-31T16:18:26Z

I'm wondering if this gets merged to containerd before we release 22.05. ;)

Merged! containerd/containerd#5828

Unfortunately the diff does not apply cleanly to 1.5.9.

Scrumplex · 2022-10-29T11:00:13Z

Looks like this has been backported to the 1.5 branch: containerd/containerd#6509

wamserma · 2022-10-29T12:04:03Z

At least the shutdown issue seems gone on 22.05. The other issue seems to be solved by #103228 - closing.

Scrumplex · 2022-10-29T12:33:31Z

From what I can tell, I still have this issue. Running on 22.05.3891.7269939a5d5 (Quokka). I can confirm that my k3s.service file is the same as in #103228

ThinkChaos added the 0.kind: bug label Nov 8, 2020

veprbl added the 6.topic: nixos label Nov 9, 2020

ThinkChaos mentioned this issue Nov 9, 2020

nixos/k3s: Update service to match upstream #103228

Merged

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 30, 2022

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 29, 2022

wamserma closed this as completed Oct 29, 2022

wamserma reopened this Oct 29, 2022

superherointj self-assigned this Mar 29, 2023

superherointj mentioned this issue Mar 30, 2023

nixos/k3s: start after network-online.target #223823

Merged

superherointj closed this as completed in #223823 Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s: Service starts up before network; and doesn't shutdown properly #103158

k3s: Service starts up before network; and doesn't shutdown properly #103158

ThinkChaos commented Nov 8, 2020

freezeboy commented Nov 8, 2020 •

edited

ThinkChaos commented Nov 9, 2020

euank commented Nov 9, 2020 •

edited

ThinkChaos commented Nov 9, 2020

jnetod commented Jun 3, 2021

wamserma commented Jul 12, 2021

codygman commented Nov 5, 2021

rgoulter commented Nov 5, 2021

Baughn commented Dec 7, 2021

mohe2015 commented Dec 7, 2021

wamserma commented Dec 7, 2021

dustinlacewell commented Dec 17, 2021

wamserma commented Jan 31, 2022

Scrumplex commented Oct 29, 2022

wamserma commented Oct 29, 2022

Scrumplex commented Oct 29, 2022

k3s: Service starts up before network; and doesn't shutdown properly #103158

k3s: Service starts up before network; and doesn't shutdown properly #103158

Comments

ThinkChaos commented Nov 8, 2020

freezeboy commented Nov 8, 2020 • edited

ThinkChaos commented Nov 9, 2020

euank commented Nov 9, 2020 • edited

ThinkChaos commented Nov 9, 2020

jnetod commented Jun 3, 2021

wamserma commented Jul 12, 2021

codygman commented Nov 5, 2021

rgoulter commented Nov 5, 2021

Baughn commented Dec 7, 2021

mohe2015 commented Dec 7, 2021

wamserma commented Dec 7, 2021

dustinlacewell commented Dec 17, 2021

wamserma commented Jan 31, 2022

Scrumplex commented Oct 29, 2022

wamserma commented Oct 29, 2022

Scrumplex commented Oct 29, 2022

freezeboy commented Nov 8, 2020 •

edited

euank commented Nov 9, 2020 •

edited