-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet needs SELinux mounts to allow auto volume relabeling #1123
Comments
I completely forgot to submit PR for this which I had planned to do after figuring out the issue with rook/ceph dynamic PV mounts as detailed in my comment.
|
@sedlund Can you provide a clear minimal repro? Not involving a whole rook, ceph, or other system. AWS CSI volumes work fine for example. @log1cb0mb read more about |
@dghubble I've had to switch to a kubeadm based install because of containerd/containerd#6767 so I can't carry the torch for this one sorry. |
@dghubble - I've gone ahead and built you a minimal reproducer. tempest.tfprovider "aws" {
region = "us-east-2"
shared_credentials_files = [
"/home/<your user>/.aws/credentials"
]
}
provider "ct" {}
terraform {
required_providers {
ct = {
source = "poseidon/ct"
version = "0.10.0"
}
aws = {
source = "hashicorp/aws"
version = "4.5.0"
}
}
}
module "tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.23.6"
# AWS
cluster_name = "tempest"
dns_zone = "<your zone>"
dns_zone_id = "<your zone id>"
# configuration
ssh_authorized_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMP2QHu8XD6z4OOftE9J6z9CIc3lhnE1yKI460mzmCB3 jharmison@gmail.com"
# optional
worker_count = 2
worker_type = "t3.small"
}
resource "local_file" "kubeconfig-tempest" {
content = module.tempest.kubeconfig-admin
filename = "/home/<your user>/.kube/tempest.config"
} aws-iam-secret.yamlapiVersion: v1
kind: Secret
metadata:
name: aws-secret
namespace: kube-system
stringData:
key_id: "<redacted>"
access_key: "<redacted>" storageclass.yamlapiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: xfs
type: io1
iopsPerGB: "50"
encrypted: "true" Commands: # deploy cluster
terraform apply -auto-approve
export KUBECONFIG=~/.kube/tempest.config
kubectl get nodes -w
# wait for nodes to come ready
# apply ebs CSI
kubectl apply -f aws-iam-secret.yaml
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.5"
kubectl rollout status -w deploy/ebs-csi-controller -n kube-system
kubectl apply -f storageclass.yaml
# run a simple test using the default CSI-backed storageclass
kubectl apply -f https://raw.githubusercontent.com/yasker/kbench/main/deploy/fio.yaml
kubectl logs -l kbench=fio -f My output: $ kubectl logs -l kbench=fio -f
TEST_FILE: /volume/test
TEST_OUTPUT_PREFIX: test_device
TEST_SIZE: 30G
Benchmarking iops.fio into test_device-iops.json
fio: pid=0, err=13/file:filesetup.c:162, func=open, error=Permission denied
fio: pid=0, err=13/file:filesetup.c:162, func=open, error=Permission denied
fio: pid=0, err=13/file:filesetup.c:162, func=open, error=Permission denied
fio: pid=0, err=13/file:filesetup.c:162, func=open, error=Permission denied |
I'll update my PR with a test of the exact same set of steps, simply using my branch, once I finish running through it. edited to add: @sedlund I've been using a fork of Typhoon in which I've swapped containerd for cri-o on fcos. There were some road bumps to get there (especially around Cilium and CNI) but they weren't insurmountable. I figure it goes against the ideas of Typhoon proper to add that much complexity into options, but if there's willingness to look it over and consider an alternative runtime implementation I'd be willing to contribute it. |
You're seeing that AWS CSI volumes show permission denied for file access? For simplicity, you should be able to remove kbench from the equation - just an alpine pod with the same mount and touch a file to see the same symptom. To dig into why, can you inspec the container's volume from the host? Something doesn't add up. Are you using AWS CSI volumes are used on Typhoon regularly. They get relabled as expected.
I've written about why |
Yes, the configs and commands I added here are exactly what I ran.
Sure, I can reprovision and do this tomorrow. It will be that the mount is lacking the
Not in the outputs linked above. Again, I ran exactly what I pasted, including an official Typhoon release. I'm using
I have not been explicitly setting mountOptions on StorageClasses, and I did not expect to have to do so. The kubelet recognizes that SELinux relabelling is required on other systems and does so automatically (as Typhoon does in the branch I PR'd yesterday). This is particularly useful in the case of, for example,
I see no reason to revisit this and understand your points there. I'm using a oneshot unit with the experimental module layering support and happy to keep doing so, delaying releases to align with CRI-O as you mentioned. I'm also happy to maintain my own (private) fork and self-support in doing so. Typhoon's systems have worked well for me, and I appreciate the work you've put into it. I'm going to continue to maintain my patch set for my own infrastructure. |
@dghubble Relabel flag does solve or help with issues like #1142 but that particular issue is containerd relabelling broken. I already opened issue regarding that: containerd/containerd#6767 As mentioned relabelling the whole directory not only is harmful from security hardening perspective as it remove container specific context labels but also potential issues like: kubernetes/kubernetes#69799 The details of security hardening I mentioned in this comment: rook/rook#7575 (comment)
From the original commit:
Isnt this part actually done by container runtime to relabel with pod specific context labels so not sure why would it be required to be relabelled? More importantly configmap or similar volume mounts are generated at runtime and then labelled with appropriate pod specific context labels so that volume is basically already being accessed by container anyway. The relabel from kubelet is only applied once the kubelet service reloads. In short, kubelet itself should not perform any relabelling but instead simply pass relabelling info to container runtime as its supposed to and container runtime should take care of appropriate relabelling. Assuming kubelet has those selinux bind mounts so that it can set A background on how I discovered this behaviour was with NetApp Ontap/trident CSI and its volume mounts, the ontap version that i was using did not support This lead to kubelet service failing to start completely as relabel operation kept failing, the workaround was that I had to remove those mounts so that kubelet wouldnt have to relabel any directory/mount without |
Alright, closing in on a concrete rationale. Using AWS CSI using StorageClass without an explicit mount option. The mount will have the following context (and not be accessible from within the container):
You expect Kublet to automatically relabel a volume. With the mount flags,
The mount will have the following context (labels random of course) (and be accessible from the container).
And on recreate, the volume is relabled again. This is a better technical rationale than the various mentions of adding the flags to get apps or vendor products to work. @solacelost can you update your commit message with this info? Or I can formulate it so its a good record. /var/lib/kubeletKubelet mounts FCOS nodes are still SELinux enforcing and For the remainder of this issue, I'll focus on the OP's issue. |
fixes #1123 Enables the use of CSI drivers with a StorageClass that lacks an explicit context mount option. In cases where the kubelet lacks mounts for `/etc/selinux` and `/sys/fs/selinux`, it is unable to set the `:Z` option for the CRI volume definition automatically. See [KEP 1710](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1710-selinux-relabeling/README.md#volume-mounting) for more information on how SELinux is passed to the CRI by Kubelet. Prior to this change, a not-explicitly-labelled mount would have an `unlabeled_t` SELinux type on the host. Following this change, the Kubelet and CRI work together to dynamically relabel mounts that lack an explicit context specification every time it is rebound to a pod with SELinux type `container_file_t` and appropriate context labels to match the specifics for the pod it is bound to. This enables applications running in containers to consume dynamically provisioned storage on SELinux enforcing systems without explicitly setting the context on the StorageClass or PersistentVolume.
fixes poseidon#1123 Enables the use of CSI drivers with a StorageClass that lacks an explicit context mount option. In cases where the kubelet lacks mounts for `/etc/selinux` and `/sys/fs/selinux`, it is unable to set the `:Z` option for the CRI volume definition automatically. See [KEP 1710](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1710-selinux-relabeling/README.md#volume-mounting) for more information on how SELinux is passed to the CRI by Kubelet. Prior to this change, a not-explicitly-labelled mount would have an `unlabeled_t` SELinux type on the host. Following this change, the Kubelet and CRI work together to dynamically relabel mounts that lack an explicit context specification every time it is rebound to a pod with SELinux type `container_file_t` and appropriate context labels to match the specifics for the pod it is bound to. This enables applications running in containers to consume dynamically provisioned storage on SELinux enforcing systems without explicitly setting the context on the StorageClass or PersistentVolume.
Description
Kubelet provides a mechanism for SELinux context relabeling but bind mounts in Typhoon are not supplied to allow it.
Also see: #935
Steps to Reproduce
Install Rook ceph storage. Pods created with PVC will attach storage, but upon accessing the volume you receive:
Expected behavior
SElinux relabeling to work properly.
Environment
Possible Solution
From: rook/rook#7575 (comment)
Explains the issue well. Adding two bind mounts to the kubelet allows it to do relabeling.
The text was updated successfully, but these errors were encountered: