New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide mechanism to trivially re-enable AF_VSOCK in the default seccomp profile #44670
Comments
If this is really expected to be a capability, this should be proposed to the kernel. But getting this accepted in the upstream might be difficult, as the kernel has been already consuming 41 of the total 64 bits. Perhaps just adding a new seccomp option like |
We can also consider defining the "virtual capability" like |
@AkihiroSuda Trying to use |
I strongly object to the concept of a virtual capability -- capabilities in the Linux kernel are well-known and well-defined, and inventing our own is overloading the meaning of the API field and going to create confusion in the ecosystem. |
Also, as there was no update here: This was discussed in the maintainers call. The consensus was no new flags will be added; instead the VSOCK change will be reverted in the 20.10 branch (keep in mind that it is not namespaced so your containers will be able to talk to the Hypervisor Guest Services socket for instance). On 23.0 and newer, it will be blocked by default and you will need to pass a custom seccomp filter if you want to re-allow it. There are patches upstream that add namespacing to AF_VSOCK, but none have been accepted yet/look likely to land in the next several cycles, so this will likely remain the state of the art for some time. |
@neersighted Totally agree, I think the best route is to add a seccomp filter instead of a Virtual Capability. |
This reverts commit 57b2290. This change, while favorable from a security standpoint, caused a regression for users of the 20.10 branch of Moby. As such, we are reverting it to ensure stability and compatibility for the affected users. However, users of AF_VSOCK in containers should recognize that this (special) address family is not currently namespaced in any version of the Linux kernel, and may result in unexpected behavior, like VMs communicating directly with host hypervisors. Future branches, including the 23.0 branch, will continue to filter AF_VSOCK. Users who need to allow containers to communicate over the unnamespaced AF_VSOCK will need to turn off seccomp confinement or set a custom seccomp profile. It is our hope that future mechanisms will make this more ergonomic/maintainable for end users, and that future kernels will support namespacing of AF_VSOCK. Closes moby#44670. Signed-off-by: Bjorn Neergaard <bneergaard@mirantis.com>
@neersighted Agree, as of today with |
Maintainer edit: Won't be accepted as is, but alternative mechanisms not involving --cap-add should be supported.
Description
Recently several PR's were merged blocking calls to AF_VSOCK. These were backported without warning on a friday as a patch release. This caused a major headache where multiple client systems went down.
There's currently no documentation on how to enable these calls again, or a way give a container the ability to use them unless using priviledge. Forcing users to use a custom seccomp profile as the only way to enable this is not user friendly.
Proposing the addition of a CAP for enabling AF_VSOCK on a container basis.
Example:
Potential names:
Valid use case:
Related PR's: #44562 #44563 #44564
Cc. @thaJeztah @AkihiroSuda @GabrielNicolasAvellaneda
The text was updated successfully, but these errors were encountered: