Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apparmor: Allow confined runc to kill containers #47749

Merged
merged 1 commit into from May 2, 2024

Conversation

woky
Copy link
Contributor

@woky woky commented Apr 23, 2024

/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor
v4.0.0. This change breaks stopping of containers, because the profile
assigned to containers doesn't accept signals from the "runc" peer.
AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.

In the case of Docker, this regression is hidden by the fact that
dockerd itself sends SIGKILL to the running container after runc fails
to stop it. It is still a regression, because graceful shutdowns of
containers via "docker stop" are no longer possible, as SIGTERM from
runc is not delivered to them. This can be seen in logs from dockerd
when run with debug logging enabled and also from tracing signals with
killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):

Test commands:

root@cloudimg:~# docker run -d --name test redis
ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
root@cloudimg:~# docker stop test

Relevant syslog messages (with wrapped long lines):

Apr 23 20:45:26 cloudimg kernel: audit:
  type=1400 audit(1713905126.444:253): apparmor="DENIED"
  operation="signal" class="signal" profile="docker-default" pid=9289
  comm="runc" requested_mask="receive" denied_mask="receive"
  signal=kill peer="runc"
Apr 23 20:45:36 cloudimg dockerd[9030]:
  time="2024-04-23T20:45:36.447016467Z"
  level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL"
  container=ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
  error="context deadline exceeded"

Killsnoop output after "docker stop ...":

root@cloudimg:~# killsnoop-bpfcc
TIME      PID      COMM             SIG  TPID     RESULT
20:51:00  9631     runc             3    9581     -13
20:51:02  9637     runc             9    9581     -13
20:51:12  9030     dockerd          9    9581     0

This change extends the docker-default profile with rules that allow
receiving signals from processes that run confined with either runc or
crun profile (crun[4] is an alternative OCI runtime that's also confined
in AppArmor >= v4.0.0, see [1]). It is backward compatible because the
peer value is a regular expression (AARE) so the referenced profile
doesn't have to exist for this profile to successfully compile and load.

Note that the runc profile has an attachment to /usr/sbin/runc. This is
the path where the runc package in Debian/Ubuntu puts the binary. When
the docker-ce package is installed from the upstream repository[3], runc
is installed as part of the containerd.io package at /usr/bin/runc.
Therefore it's still running unconfined and has no issues sending
signals to containers.

[1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936
[2] https://github.com/iovisor/bcc/blob/master/tools/killsnoop.py
[3] https://download.docker.com/linux/ubuntu
[4] https://github.com/containers/crun

Signed-off-by: Tomáš Virtus nechtom@gmail.com

- Description for the changelog

apparmor: Allow confined runc to kill containers

@woky
Copy link
Contributor Author

woky commented Apr 23, 2024

Relevant PR in containerd: containerd/containerd#10123

@samuelkarp
Copy link
Member

Similar to containerd/containerd#10123 (comment), I can't reproduce this using Docker either.

@vvoland
Copy link
Contributor

vvoland commented Apr 24, 2024

Looks like related to: #47720

It was closed after the reporter mentioned that using our upstream packages fixes the issue, which seems to be explained by:

Note that the runc profile has an attachment to /usr/sbin/runc. This is
the path where the runc package in Debian/Ubuntu puts the binary. When
the docker-ce package is installed from the upstream repository[3], runc
is installed as part of the containerd.io package at /usr/bin/runc.
Therefore it's still running unconfined and has no issues sending
signals to containers.

Comment on lines 28 to 31
# runc may send signals to container processes (for "docker stop").
signal (send,receive) peer=runc,
# crun may send signals to container processes (for "docker stop" when used with crun OCI runtime).
signal (send,receive) peer=crun,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't receive rule alone be sufficient? Why do we need the send?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've fixed it to be receive only rules.

@woky
Copy link
Contributor Author

woky commented Apr 24, 2024

Similar to containerd/containerd#10123 (comment), I can't reproduce this using Docker either.

Can you please be more specific? You don't see the DENIED messages in journal? Or do you just observe that docker stop ... kills containers? Because if you observe only the latter, then this isn't what this fix is about. The problem is that containers are always killed with SIGKILL.

/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor
v4.0.0. This change breaks stopping of containers, because the profile
assigned to containers doesn't accept signals from the "runc" peer.
AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.

In the case of Docker, this regression is hidden by the fact that
dockerd itself sends SIGKILL to the running container after runc fails
to stop it. It is still a regression, because graceful shutdowns of
containers via "docker stop" are no longer possible, as SIGTERM from
runc is not delivered to them. This can be seen in logs from dockerd
when run with debug logging enabled and also from tracing signals with
killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):

  Test commands:

    root@cloudimg:~# docker run -d --name test redis
    ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
    root@cloudimg:~# docker stop test

  Relevant syslog messages (with wrapped long lines):

    Apr 23 20:45:26 cloudimg kernel: audit:
      type=1400 audit(1713905126.444:253): apparmor="DENIED"
      operation="signal" class="signal" profile="docker-default" pid=9289
      comm="runc" requested_mask="receive" denied_mask="receive"
      signal=kill peer="runc"
    Apr 23 20:45:36 cloudimg dockerd[9030]:
      time="2024-04-23T20:45:36.447016467Z"
      level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL"
      container=ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
      error="context deadline exceeded"

  Killsnoop output after "docker stop ...":

    root@cloudimg:~# killsnoop-bpfcc
    TIME      PID      COMM             SIG  TPID     RESULT
    20:51:00  9631     runc             3    9581     -13
    20:51:02  9637     runc             9    9581     -13
    20:51:12  9030     dockerd          9    9581     0

This change extends the docker-default profile with rules that allow
receiving signals from processes that run confined with either runc or
crun profile (crun[4] is an alternative OCI runtime that's also confined
in AppArmor >= v4.0.0, see [1]). It is backward compatible because the
peer value is a regular expression (AARE) so the referenced profile
doesn't have to exist for this profile to successfully compile and load.

Note that the runc profile has an attachment to /usr/sbin/runc. This is
the path where the runc package in Debian/Ubuntu puts the binary. When
the docker-ce package is installed from the upstream repository[3], runc
is installed as part of the containerd.io package at /usr/bin/runc.
Therefore it's still running unconfined and has no issues sending
signals to containers.

[1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936
[2] https://github.com/iovisor/bcc/blob/master/tools/killsnoop.py
[3] https://download.docker.com/linux/ubuntu
[4] https://github.com/containers/crun

Signed-off-by: Tomáš Virtus <nechtom@gmail.com>
@thaJeztah
Copy link
Member

thaJeztah commented Apr 24, 2024

I have yet to read up on all the links, but is this due to how runc is packaged? I know the version of runc installed by packages from download.docker.com (which bundles runc in the containerd.io package) installs runc in /usr/bin/runc. But I know some distro packages ship it as a standalone package and/or install it in /usr/sbin/runc

Also previous PRs and tickets related to this (but for unconfined);

@woky
Copy link
Contributor Author

woky commented Apr 24, 2024

I have yet to read up on all the links, but is this due to how runc is packaged?

As stated in the last paragraph of the commit message, quoted by Paweł above, the reason it works with runc from download.docker.com is due to how it is packaged, but packaging of runc is not the cause of the problem, only a factor. The cause is the linked change in AppArmor v4.0.0.

@thaJeztah
Copy link
Member

So, perhaps a very silly question, as I don't know what the common approach is for this, but those profiles are now added to the apparmor package, on behalf of other packages (like runc and crun). Are such changes considered to be temporary solutions (i.e., runc itself doesn't install a profile, so the apparmor package creates a stub), and should the "northern star" here be for runc itself to be installing the profile?

Also asking that, because the profile added is packaging-specific; canonical packaging it separate and using a different location (sbin vs bin), which to some extent would make it feel more natural to have the profile also included in the runc (and crun) package; allowing it to define the profile with the correct location of the binary (if installed).

In that situation, Docker (and other packagers) can ship the appropriate profile for their packages (with the location they choose to use for the binary), without apparmor trying to guess / detect if it's installed.

To some extent it could even be considered for the canonical packages of docker to have the profile adjusted to match the other changes, but mostly curious what common procedure is for these

More generally, I wonder if a generic peer is something that would be possible (e.g. oci-runtime); based on the linked changes, it looks like it's possible that "other" runtimes will also get assigned their own, and I think we also want to avoid having to adjust the profile in this repository with an unbounded list of "possible" runtimes that exist.

also /cc @neersighted @tianon if you have thoughts / ideas / suggestions.

@thaJeztah
Copy link
Member

FWIW; I'm not strongly against these changes, but trying to have a good insight in this matter, and I want to avoid getting on a sliding slope if there's better alternatives possible, and "now" may be the right moment to think about / discuss those.

@samuelkarp
Copy link
Member

Can you please be more specific? You don't see the DENIED messages in journal? Or do you just observe that docker stop ... kills containers? Because if you observe only the latter, then this isn't what this fix is about. The problem is that containers are always killed with SIGKILL.

I don't see any messages and docker stop kills the container immediately (rather than waiting for the SIGTERM timeout first).

Note that the runc profile has an attachment to /usr/sbin/runc. This is the path where the runc package in Debian/Ubuntu puts the binary. When the docker-ce package is installed from the upstream repository[3], runc is installed as part of the containerd.io package at /usr/bin/runc. Therefore it's still running unconfined and has no issues sending signals to containers.

I think this explains what I was seeing.

If the upstream packages are working (which they are), this seems like a bug in either the installed AppArmor profile or in the Debian and Ubuntu packages. Like what @thaJeztah is asking, I think this would make sense to fix there rather than upstream (since the upstream packages + profile are not being used). We can also take this upstream too, but it seems the wrong place to "fix" the bug.

@jrjohansen
Copy link

If the upstream packages are working (which they are), this seems like a bug in either the installed AppArmor profile or in the Debian and Ubuntu packages. Like what @thaJeztah is asking, I think this would make sense to fix there rather than upstream (since the upstream packages + profile are not being used). We can also take this upstream too, but it seems the wrong place to "fix" the bug.

So not a bug in the AppArmor profile. AppArmor requires a cross check on communication, which means the rules must be added to the generated profile. Which is what the patch is doing,

Using a generic peer like
@{oci_runtime}
is an option. Ubuntu/AppArmor upsteam would have to define the variable but then the generated rule wouldn't need to change.

@samuelkarp
Copy link
Member

So not a bug in the AppArmor profile. AppArmor requires a cross check on communication, which means the rules must be added to the generated profile. Which is what the patch is doing,

Why do the upstream packages work without this change?

@thaJeztah
Copy link
Member

Why do the upstream packages work without this change?

I think that's because the profile set by the AppArmor package only looks if it finds runc in /usr/sbin (and if so, it applies the profile)

For the upstream package it's in /usr/bin, so it doesn't get found by AppArmor, and is running unconfined.

woky added a commit to woky/common that referenced this pull request Apr 28, 2024
…op, kill)")

This makes projects using AppArmor bits from
golang-github-containers-common (notably podman) work with AppArmor
v4.0.0.

There is a similar issue with containerd clients and docker. The fix to
containerd was merged in upstream[1]. The fix to moby (docker) was
submitted but seems to have stalled[2]. Upstream notes we should fix
regressions we introduced in Ubuntu or perhaps at least introduce a
generic way to refer to OCI runtimes under a single peer name. I suspect
we would get similar objections in containers/common. That's why I
haven't yet submitted a patch to upstream.

In the meantime, patch this library so that podman can work with OCI
runtimes we currently confine.

[1] containerd/containerd#10123
[2] moby/moby#47749

Bug: https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483
woky added a commit to woky/common that referenced this pull request Apr 28, 2024
…op, kill)")

This makes projects using AppArmor bits from
golang-github-containers-common (notably podman) work with AppArmor
v4.0.0.

There is a similar issue with containerd clients and docker. The fix was
merged to the containerd upstream[1]. The fix to moby (docker) was
submitted but seems to have stalled[2]. Upstream notes we should fix
regressions we introduced in Ubuntu or perhaps at least introduce a
generic way to refer to OCI runtimes under a single peer name. I suspect
we would get similar objections in containers/common. That's why I
haven't yet submitted the patch to the upstream.

In the meantime, patch this library so that podman can work with OCI
runtimes we currently confine.

[1] containerd/containerd#10123
[2] moby/moby#47749

Bug: https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483
woky added a commit to woky/common that referenced this pull request Apr 28, 2024
…op, kill)")

This makes projects using AppArmor bits from
golang-github-containers-common (notably podman) work with AppArmor
v4.0.0.

There is a similar issue with containerd clients and docker. The fix was
merged to the containerd upstream[1]. The fix to moby (docker) was
submitted but seems to have stalled[2]. Upstream notes we should fix
regressions we introduced in Ubuntu or perhaps at least introduce a
generic way to refer to OCI runtimes under a single peer name. I suspect
we would get similar objections in containers/common. That's why I
haven't yet submitted the patch to the upstream.

In the meantime, patch this library so that podman can work with OCI
runtimes we currently confine.

[1] containerd/containerd#10123
[2] moby/moby#47749

Bug: https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of nuts to me that AppArmor upstream is adding profiles for things they don't maintain (and then we get to keep the [broken] pieces), but 🤷

@thaJeztah thaJeztah merged commit 5d03db2 into moby:master May 2, 2024
127 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants