Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide mechanism to trivially re-enable AF_VSOCK in the default seccomp profile #44670

Open
gaby opened this issue Dec 19, 2022 · 8 comments
Open
Labels
area/kernel area/security/seccomp area/security kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@gaby
Copy link

gaby commented Dec 19, 2022

Maintainer edit: Won't be accepted as is, but alternative mechanisms not involving --cap-add should be supported.


Description

Recently several PR's were merged blocking calls to AF_VSOCK. These were backported without warning on a friday as a patch release. This caused a major headache where multiple client systems went down.

There's currently no documentation on how to enable these calls again, or a way give a container the ability to use them unless using priviledge. Forcing users to use a custom seccomp profile as the only way to enable this is not user friendly.

Proposing the addition of a CAP for enabling AF_VSOCK on a container basis.

Example:

docker run -it --cap-add NET_VSOCK alpine:latest

Potential names:

  • SYS_VSOCK
  • NET_SOCKET
  • NET_VSOCK
  • VSOCK

Valid use case:

  • AF_VSOCK enables networks using Libvirt and KubeVirt to create management tunnels into their Libvirt Domains. For example, we deploy core VM's with a systemd service allowing VSOCK to TCP connection into the local ssh daemon. The domains are running on a segregated network, so the only way to communicate with them is via VSOCK in combination with socat. On the host we dynamically enable these tunnels using socat running inside Docker containers.

Related PR's: #44562 #44563 #44564

Cc. @thaJeztah @AkihiroSuda @GabrielNicolasAvellaneda

@gaby gaby added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage labels Dec 19, 2022
@AkihiroSuda
Copy link
Member

If this is really expected to be a capability, this should be proposed to the kernel. But getting this accepted in the upstream might be difficult, as the kernel has been already consuming 41 of the total 64 bits.

Perhaps just adding a new seccomp option like --security-opt seccomp=+vsock might suffice?

@AkihiroSuda
Copy link
Member

We can also consider defining the "virtual capability" like --cap-add VCAP_NET_VSOCK for controlling the seccomp profile, but that might be rather confusing.

@gaby
Copy link
Author

gaby commented Dec 21, 2022

@AkihiroSuda Trying to use --security-opt was my first though. After 10-20mins I realized it had to be backed in into the engine for it to work. The VCAP approach would make sense if there's other functions that could also benefit from having a Virtual Capability assigned to them.

@neersighted
Copy link
Member

I strongly object to the concept of a virtual capability -- capabilities in the Linux kernel are well-known and well-defined, and inventing our own is overloading the meaning of the API field and going to create confusion in the ecosystem.

@neersighted
Copy link
Member

Also, as there was no update here: This was discussed in the maintainers call. The consensus was no new flags will be added; instead the VSOCK change will be reverted in the 20.10 branch (keep in mind that it is not namespaced so your containers will be able to talk to the Hypervisor Guest Services socket for instance). On 23.0 and newer, it will be blocked by default and you will need to pass a custom seccomp filter if you want to re-allow it.

There are patches upstream that add namespacing to AF_VSOCK, but none have been accepted yet/look likely to land in the next several cycles, so this will likely remain the state of the art for some time.

@gaby
Copy link
Author

gaby commented Dec 22, 2022

@neersighted Totally agree, I think the best route is to add a seccomp filter instead of a Virtual Capability.

neersighted added a commit to neersighted/moby that referenced this issue Dec 29, 2022
This reverts commit 57b2290.

This change, while favorable from a security standpoint, caused a
regression for users of the 20.10 branch of Moby. As such, we are
reverting it to ensure stability and compatibility for the affected
users.

However, users of AF_VSOCK in containers should recognize that this
(special) address family is not currently namespaced in any version of
the Linux kernel, and may result in unexpected behavior, like VMs
communicating directly with host hypervisors.

Future branches, including the 23.0 branch, will continue to filter
AF_VSOCK. Users who need to allow containers to communicate over the
unnamespaced AF_VSOCK will need to turn off seccomp confinement or set a
custom seccomp profile.

It is our hope that future mechanisms will make this more
ergonomic/maintainable for end users, and that future kernels will
support namespacing of AF_VSOCK.

Closes moby#44670.

Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
@neersighted
Copy link
Member

neersighted commented Apr 3, 2024

Re-opening this; while it will not be implemented as-is, #47663 has me thinking of other ways to answer this ask.

Alternatively, #22109 is more involved but a more holistic way to solve this in the long term.

@neersighted neersighted reopened this Apr 3, 2024
@neersighted neersighted changed the title Provide CAP for enabling AF_VSOCK support Provide mechanism to trivially re-enable AF_VSOCK in the default seccomp profile Apr 3, 2024
@gaby
Copy link
Author

gaby commented Apr 3, 2024

@neersighted Agree, as of today with v26.x using a custom seccomp is the only way to re-enable VSOCK support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel area/security/seccomp area/security kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

3 participants