Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for user namespaces phase 1 (KEP 127) #111090

Merged
merged 12 commits into from Aug 3, 2022
6 changes: 5 additions & 1 deletion api/openapi-spec/swagger.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion api/openapi-spec/v3/api__v1_openapi.json
Expand Up @@ -5032,6 +5032,10 @@
"description": "Use the host's pid namespace. Optional: Default to false.",
"type": "boolean"
},
"hostUsers": {
"description": "Use the host's user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.",
rata marked this conversation as resolved.
Show resolved Hide resolved
"type": "boolean"
},
"hostname": {
"description": "Specifies the hostname of the Pod If not specified, the pod's hostname will be set to a system-defined value.",
"type": "string"
Expand Down Expand Up @@ -5083,7 +5087,7 @@
"$ref": "#/components/schemas/io.k8s.api.core.v1.PodOS"
}
],
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.hostUsers - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
},
"overhead": {
"additionalProperties": {
Expand Down
6 changes: 5 additions & 1 deletion api/openapi-spec/v3/apis__apps__v1_openapi.json
Expand Up @@ -3455,6 +3455,10 @@
"description": "Use the host's pid namespace. Optional: Default to false.",
"type": "boolean"
},
"hostUsers": {
"description": "Use the host's user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.",
"type": "boolean"
},
"hostname": {
"description": "Specifies the hostname of the Pod If not specified, the pod's hostname will be set to a system-defined value.",
"type": "string"
Expand Down Expand Up @@ -3506,7 +3510,7 @@
"$ref": "#/components/schemas/io.k8s.api.core.v1.PodOS"
}
],
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.hostUsers - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
},
"overhead": {
"additionalProperties": {
Expand Down
6 changes: 5 additions & 1 deletion api/openapi-spec/v3/apis__batch__v1_openapi.json
Expand Up @@ -2534,6 +2534,10 @@
"description": "Use the host's pid namespace. Optional: Default to false.",
"type": "boolean"
},
"hostUsers": {
"description": "Use the host's user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.",
"type": "boolean"
},
"hostname": {
"description": "Specifies the hostname of the Pod If not specified, the pod's hostname will be set to a system-defined value.",
"type": "string"
Expand Down Expand Up @@ -2585,7 +2589,7 @@
"$ref": "#/components/schemas/io.k8s.api.core.v1.PodOS"
}
],
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
"description": "Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.\n\nIf the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions\n\nIf the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.hostUsers - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[*].securityContext.seLinuxOptions - spec.containers[*].securityContext.seccompProfile - spec.containers[*].securityContext.capabilities - spec.containers[*].securityContext.readOnlyRootFilesystem - spec.containers[*].securityContext.privileged - spec.containers[*].securityContext.allowPrivilegeEscalation - spec.containers[*].securityContext.procMount - spec.containers[*].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup"
},
"overhead": {
"additionalProperties": {
Expand Down
18 changes: 18 additions & 0 deletions pkg/api/pod/util.go
Expand Up @@ -539,6 +539,15 @@ func dropDisabledFields(
})
}

// If the feature is disabled and not in use, drop the hostUsers field.
if !utilfeature.DefaultFeatureGate.Enabled(features.UserNamespacesStatelessPodsSupport) && !hostUsersInUse(oldPodSpec) {
// Drop the field in podSpec only if SecurityContext is not nil.
// If it is nil, there is no need to set hostUsers=nil (it will be nil too).
if podSpec.SecurityContext != nil {
podSpec.SecurityContext.HostUsers = nil
}
}

dropDisabledProcMountField(podSpec, oldPodSpec)

dropDisabledCSIVolumeSourceAlphaFields(podSpec, oldPodSpec)
Expand Down Expand Up @@ -672,6 +681,15 @@ func nodeTaintsPolicyInUse(podSpec *api.PodSpec) bool {
return false
}

// hostUsersInUse returns true if the pod spec has spec.hostUsers field set.
func hostUsersInUse(podSpec *api.PodSpec) bool {
if podSpec != nil && podSpec.SecurityContext != nil && podSpec.SecurityContext.HostUsers != nil {
return true
}

return false
}

// procMountInUse returns true if the pod spec is non-nil and has a SecurityContext's ProcMount field set to a non-default value
func procMountInUse(podSpec *api.PodSpec) bool {
if podSpec == nil {
Expand Down
97 changes: 97 additions & 0 deletions pkg/api/pod/util_test.go
Expand Up @@ -1949,3 +1949,100 @@ func TestDropDisabledMatchLabelKeysField(t *testing.T) {
})
}
}

func TestDropHostUsers(t *testing.T) {
falseVar := false
trueVar := true

podWithoutHostUsers := func() *api.Pod {
return &api.Pod{
Spec: api.PodSpec{
SecurityContext: &api.PodSecurityContext{}},
}
}
podWithHostUsersFalse := func() *api.Pod {
return &api.Pod{
Spec: api.PodSpec{
SecurityContext: &api.PodSecurityContext{
HostUsers: &falseVar,
},
},
}
}
podWithHostUsersTrue := func() *api.Pod {
return &api.Pod{
Spec: api.PodSpec{
SecurityContext: &api.PodSecurityContext{
HostUsers: &trueVar,
},
},
}
}

podInfo := []struct {
description string
hasHostUsers bool
pod func() *api.Pod
}{
{
description: "with hostUsers=true",
hasHostUsers: true,
pod: podWithHostUsersTrue,
},
{
description: "with hostUsers=false",
hasHostUsers: true,
pod: podWithHostUsersFalse,
},
{
description: "with hostUsers=nil",
pod: func() *api.Pod { return nil },
},
}

for _, enabled := range []bool{true, false} {
for _, oldPodInfo := range podInfo {
for _, newPodInfo := range podInfo {
oldPodHasHostUsers, oldPod := oldPodInfo.hasHostUsers, oldPodInfo.pod()
newPodHasHostUsers, newPod := newPodInfo.hasHostUsers, newPodInfo.pod()
if newPod == nil {
continue
}

t.Run(fmt.Sprintf("feature enabled=%v, old pod %v, new pod %v", enabled, oldPodInfo.description, newPodInfo.description), func(t *testing.T) {
defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.UserNamespacesStatelessPodsSupport, enabled)()

DropDisabledPodFields(newPod, oldPod)

// old pod should never be changed
if !reflect.DeepEqual(oldPod, oldPodInfo.pod()) {
t.Errorf("old pod changed: %v", cmp.Diff(oldPod, oldPodInfo.pod()))
}

switch {
case enabled || oldPodHasHostUsers:
// new pod should not be changed if the feature is enabled, or if the old pod had hostUsers
if !reflect.DeepEqual(newPod, newPodInfo.pod()) {
t.Errorf("new pod changed: %v", cmp.Diff(newPod, newPodInfo.pod()))
}
case newPodHasHostUsers:
// new pod should be changed
if reflect.DeepEqual(newPod, newPodInfo.pod()) {
t.Errorf("new pod was not changed")
}
// new pod should not have hostUsers
if exp := podWithoutHostUsers(); !reflect.DeepEqual(newPod, exp) {
t.Errorf("new pod had hostUsers: %v", cmp.Diff(newPod, exp))
}
default:
// new pod should not need to be changed
if !reflect.DeepEqual(newPod, newPodInfo.pod()) {
t.Errorf("new pod changed: %v", cmp.Diff(newPod, newPodInfo.pod()))
}
}
})
}
}
}

}
13 changes: 13 additions & 0 deletions pkg/apis/core/types.go
Expand Up @@ -2976,6 +2976,7 @@ type PodSpec struct {
// If the OS field is set to windows, following fields must be unset:
// - spec.hostPID
// - spec.hostIPC
// - spec.hostUsers
// - spec.securityContext.seLinuxOptions
// - spec.securityContext.seccompProfile
// - spec.securityContext.fsGroup
Expand Down Expand Up @@ -3078,6 +3079,18 @@ type PodSecurityContext struct {
// +k8s:conversion-gen=false
// +optional
ShareProcessNamespace *bool
// Use the host's user namespace.
// Optional: Default to true.
thockin marked this conversation as resolved.
Show resolved Hide resolved
// If set to true or not present, the pod will be run in the host user namespace, useful
// for when the pod needs a feature only available to the host user namespace, such as
// loading a kernel module with CAP_SYS_MODULE.
// When set to false, a new user namespace is created for the pod. Setting false is useful
// for mitigating container breakout vulnerabilities even allowing users to run their
// containers as root without actually having root privileges on the host.
// Note that this field cannot be set when spec.os.name is windows.
rata marked this conversation as resolved.
Show resolved Hide resolved
// +k8s:conversion-gen=false
// +optional
HostUsers *bool
// The SELinux context to be applied to all containers.
// If unspecified, the container runtime will allocate a random SELinux context for each
// container. May also be set in SecurityContext. If set in
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/core/v1/conversion.go
Expand Up @@ -303,6 +303,7 @@ func Convert_core_PodSpec_To_v1_PodSpec(in *core.PodSpec, out *v1.PodSpec, s con
out.HostNetwork = in.SecurityContext.HostNetwork
out.HostIPC = in.SecurityContext.HostIPC
out.ShareProcessNamespace = in.SecurityContext.ShareProcessNamespace
out.HostUsers = in.SecurityContext.HostUsers
}

return nil
Expand Down Expand Up @@ -358,6 +359,7 @@ func Convert_v1_PodSpec_To_core_PodSpec(in *v1.PodSpec, out *core.PodSpec, s con
out.SecurityContext.HostPID = in.HostPID
out.SecurityContext.HostIPC = in.HostIPC
out.SecurityContext.ShareProcessNamespace = in.ShareProcessNamespace
out.SecurityContext.HostUsers = in.HostUsers

return nil
}
Expand Down