Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes during CreateVolume: Pods failing with "unexpected encryption status" #3402

Closed
irq0 opened this issue Sep 22, 2022 · 15 comments · Fixed by #3422
Closed

Crashes during CreateVolume: Pods failing with "unexpected encryption status" #3402

irq0 opened this issue Sep 22, 2022 · 15 comments · Fixed by #3422
Labels
bug Something isn't working component/rbd Issues related to RBD

Comments

@irq0
Copy link
Contributor

irq0 commented Sep 22, 2022

Describe the bug

With RBD block encrypted volumes, interrupting CreateVolume leads with high likelihood to broken volumes / RBD images.
Broken in the sense that encryption metadata is in an invalid state and Ceph CSI is unable to attach the volume.
Pods will get stuck in ContainerCreating as a result.

Environment details

  • Image/version of Ceph CSI driver: Saw this on 3.5.1 and current devel (+ RBD fscrypt patches) (logs below from that)
  • Kubernetes cluster version: Probably many. Reproduced last on 1.23.8 minikube 1.26.1
  • Ceph cluster version: Pacific, current main branch

Steps to reproduce

  1. Create RBD block encrypted storage class
  2. Create rbd-csiplugin-provisioner deployment where the csi-rbdplugin container has a memory resource limit that is likely to invoke the OOM killer (ex: 128Mi)
  3. Create ~10 PVCs and pods
  4. rbd-csiplugin-provisioner likely to OOM while in CreateVolume
  5. Most pods will end up in state ContainerCreating with FailedMount "rbd image … found mounted with unexpected encryption status"

Tools and configs I used are in https://github.com/irq0/dev-ceph-csi-fscrypt-config
In my minikube setup I can reproduce with close to 100% probability.

Actual results

Pods in state ContainerCreating with warning:

Warning  FailedMount             3m52s (x19 over 26m)  kubelet                  MountVolume.MountDevice failed for volume "pvc-d1d79e84-3d25-4 │
fd5-9cd3-0f57331fa47d" : rpc error: code = Internal desc = rbd image replicapool/csi-vol-b945bb0d-ed93-40ce-bb4d-32df69cbef71 found mounted with │
unexpected encryption status   

Expected behavior

Successful volume creation, even when the provider crashes.

Logs

This follows image b60b8ccc-776e-4343-993f-1e2a3669595a along:

csi-rbdplugin-provisioner first CreateVolume request

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.435162195Z I0922 10:30:23.434774       1 utils.go:195] ID: 71 Req
-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e GRPC call: /csi.v1.Controller/CreateVolume

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.435183954Z I0922 10:30:23.434942       1 utils.go:206] ID: 71 Req
-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e GRPC request: {"capacity_range":{"required_bytes":104857600},"name":"pvc-e691b358-31c1-43a9-8ffd-da499
eded20e","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-e691b358-31c1-43a9-8ffd-da499eded20e","csi.storage.k8s.io/pvc/name"
:"pvc-bomb-4879-2","csi.storage.k8s.io/pvc/namespace":"irq0","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"me
tadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","
secretNamespace":"irq0"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.435194494Z I0922 10:30:23.435033       1 rbd_util.go:1308] ID: 71
 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e setting disableInUseChecks: false image features: [layering] mounter: rbd

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.501732087Z I0922 10:30:23.500929       1 omap.go:88] ID: 71 Req-I
D: pvc-e691b358-31c1-43a9-8ffd-da499eded20e got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[]

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.612007908Z I0922 10:30:23.611620       1 omap.go:158] ID: 71 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e set omap keys (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-e691b358-31c1-43a9-8ffd-da499eded20e:b60b8ccc-776e-4343-993f-1e2a3669595a])

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.656886202Z I0922 10:30:23.656455       1 omap.go:158] ID: 71 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e set omap keys (pool="replicapool", namespace="", name="csi.volume.b60b8ccc-776e-4343-993f-1e2a3669595a"
): map[csi.imagename:csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a csi.volname:pvc-e691b358-31c1-43a9-8ffd-da499eded20e csi.volume.encryptKMS:user-ns-secrets-metadata csi.volume.encryptionType:block csi.volume.owner:irq0])

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.656896461Z I0922 10:30:23.656491       1 rbd_journal.go:490] ID: 71 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e generated Volume ID (0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a) 
and image name (csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a) for request name (pvc-e691b358-31c1-43a9-8ffd-da499eded20e)

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.657051487Z I0922 10:30:23.656557       1 rbd_util.go:423] ID: 71 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e rbd: create replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a size 100M (features: [layering
]) using mon 10.105.203.132:6789,10.105.203.132:3300

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:30:23.657054247Z I0922 10:30:23.656627       1 rbd_util.go:1555] ID: 71 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e setting image options on replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a

# No indication in my logs that ID 71 returns

csi-rbdplugin-provisioner crashes

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-resizer] 2022-09-22T10:30:23.772794991Z E0922 10:30:23.771960       1 connection.go:132] Lost co
nnection to unix:///csi/csi-provisioner.sock.
[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-snapshotter] 2022-09-22T10:30:23.773338465Z E0922 10:30:23.773083       1 connection.go:132] Los
t connection to unix:///csi/csi-provisioner.sock.
[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-snapshotter] 2022-09-22T10:30:23.773363124Z F0922 10:30:23.773170       1 connection.go:87] Lost
 connection to CSI driver, exiting
[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-resizer] 2022-09-22T10:30:23.772829329Z F0922 10:30:23.772074       1 connection.go:87] Lost connection to CSI driver, exiting
[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-attacher] 2022-09-22T10:30:23.773577248Z E0922 10:30:23.773180       1 connection.go:132] Lost connection to unix:///csi/csi-provisioner.sock.
[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-attacher] 2022-09-22T10:30:23.773598498Z F0922 10:30:23.773255       1 connection.go:87] Lost connection to CSI driver, exiting

Crashes / OOM Events

[Thu Sep 22 10:30:23 2022] Memory cgroup out of memory: Killed process 44062 (cephcsi) total-vm:3891476kB, anon-rss:127084kB, file-rss:58728kB, shmem-rss:0kB, UID:0 pgtables:932kB oom_score_adj:994
[Thu Sep 22 10:30:46 2022] Memory cgroup out of memory: Killed process 61167 (cephcsi) total-vm:3874400kB, anon-rss:127612kB, file-rss:58640kB, shmem-rss:0kB, UID:0 pgtables:948kB oom_score_adj:994
[Thu Sep 22 10:31:27 2022] Memory cgroup out of memory: Killed process 62192 (cephcsi) total-vm:3957276kB, anon-rss:127096kB, file-rss:58740kB, shmem-rss:0kB, UID:0 pgtables:980kB oom_score_adj:994

csi-rbdplugin-provisioner restart and second CreateVolume

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.190129537Z I0922 10:31:24.185936       1 utils.go:195] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e GRPC call: /csi.v1.Controller/CreateVolume

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.190257743Z I0922 10:31:24.186948       1 utils.go:206] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e GRPC request: {"capacity_range":{"required_bytes":104857600},"name":"pvc-e691b358-31c1-43a9-8ffd-da499eded20e","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-e691b358-31c1-43a9-8ffd-da499eded20e","csi.storage.k8s.io/pvc/name":"pvc-bomb-4879-2","csi.storage.k8s.io/pvc/namespace":"irq0","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.190317421Z I0922 10:31:24.187358       1 rbd_util.go:1308] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e setting disableInUseChecks: false image features: [layering] mounter: rbd

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.356612037Z I0922 10:31:24.355969       1 omap.go:88] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-e691b358-31c1-43a9-8ffd-da499eded20e:b60b8ccc-776e-4343-993f-1e2a3669595a]

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.364283491Z I0922 10:31:24.363399       1 omap.go:88] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e got omap values: (pool="replicapool", namespace="", name="csi.volume.b60b8ccc-776e-4343-993f-1e2a3669595a"): map[csi.imageid:151a3e041142 csi.imagename:csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a csi.volname:pvc-e691b358-31c1-43a9-8ffd-da499eded20e csi.volume.encryptKMS:user-ns-secrets-metadata csi.volume.encryptionType:block csi.volume.owner:irq0]

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.437017450Z I0922 10:31:24.436520       1 rbd_journal.go:345] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e found existing volume (0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a
) with image name (csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a) for request (pvc-e691b358-31c1-43a9-8ffd-da499eded20e)

[pod/csi-rbdplugin-provisioner-75dd4f948c-2l56f/csi-rbdplugin] 2022-09-22T10:31:24.437901675Z I0922 10:31:24.437024       1 utils.go:212] ID: 25 Req-ID: pvc-e691b358-31c1-43a9-8ffd-da499eded20e GRPC response: {"volume":{"capacity_bytes":104857600,"volume_context":{"clusterID":"rook-ceph","encryp
ted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a","journalPool":"replicapool","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0"},"volume_id":"0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a"}}

rbdplugin NodeStageVolume unexpected encryption status

[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.625211126Z I0922 10:31:40.624357   20745 utils.go:195] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a GRPC call: /csi.v1.Node/NodeStageVolume
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.625335302Z I0922 10:31:40.624847   20745 utils.go:206] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e691b358-31c1-43a9-8ffd-da499eded20e/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"rook-ceph","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a","journalPool":"replicapool","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0","storage.kubernetes.io/csiProvisionerIdentity":"1663842664262-8081-rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a"}
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.626030652Z I0922 10:31:40.625647   20745 rbd_util.go:1308] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a setting disableInUseChecks: false image features: [layering] mounter: rbd
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.631305326Z I0922 10:31:40.630967   20745 omap.go:88] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a got omap values: (pool="replicapool", namespace="", name="csi.volume.b60b8ccc-776e-4343-993f-1e2a3669595a"): map[csi.imageid:151a3e041142 csi.imagename:csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a csi.volname:pvc-e691b358-31c1-43a9-8ffd-da499eded20e csi.volume.encryptKMS:user-ns-secrets-metadata csi.volume.encryptionType:block csi.volume.owner:irq0]
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.715172282Z I0922 10:31:40.714957   20745 rbd_util.go:352] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a checking for ImageFeatures: [layering]
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.793816462Z I0922 10:31:40.793618   20745 cephcmds.go:105] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a command succeeded: rbd [device list --format=json --device-type krbd]
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.837838219Z I0922 10:31:40.837636   20745 rbd_attach.go:420] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a rbd: map mon 10.105.203.132:6789,10.105.203.132:3300
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.918487370Z I0922 10:31:40.918356   20745 cephcmds.go:105] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a command succeeded: rbd [--id admin -m 10.105.203.132:6789,10.105.203.132:3300 --keyfile=***stripped*** map replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a --device-type krbd --options noudev]
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:40.918517219Z I0922 10:31:40.918412   20745 nodeserver.go:415] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a rbd image: replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a was successfully mapped at /dev/rbd0
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:41.072878006Z I0922 10:31:41.072773   20745 encryption.go:87] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a image replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a encrypted state metadata reports ""
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:41.130138100Z I0922 10:31:41.129862   20745 cephcmds.go:105] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a command succeeded: rbd [unmap /dev/rbd0 --device-type krbd --options noudev]
[pod/csi-rbdplugin-zxxqw/csi-rbdplugin] 2022-09-22T10:31:41.130195878Z E0922 10:31:41.129953   20745 utils.go:210] ID: 1324 Req-ID: 0001-0009-rook-ceph-0000000000000001-b60b8ccc-776e-4343-993f-1e2a3669595a GRPC error: rpc error: code = Internal desc = rbd image replicapool/csi-vol-b60b8ccc-776e-4343-993f-1e2a3669595a found mounted with unexpected encryption status 
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 26, 2022

@Rakshith-R PTAL

@Rakshith-R
Copy link
Contributor

@irq0 can you confirm again that this same issue does not occur with just canary image?
Both the oom kill and encryption issue.

And can you manually check the created image meta data ?

@humblec
Copy link
Collaborator

humblec commented Oct 6, 2022

@irq0 yeah, please give a try with canary or 3.7.1 image

@irq0
Copy link
Contributor Author

irq0 commented Oct 10, 2022

Ran the steps again on the current devel branch (71e5b3f). The csi-rbdplugin container limited to 256MiB memory. With 42 pods starting csi-rbdplugin is crashlooping without making much progress. Same result as above, pods failing with unexpected encryption status.

For a random pod:

Events:                                                                                                                             
  Type     Reason                  Age                    From                     Message                                          
  ----     ------                  ----                   ----                     -------                                          
  Warning  FailedScheduling        8m54s (x2 over 9m58s)  default-scheduler        0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.                                                                                                         
  Normal   Scheduled               8m51s                  default-scheduler        Successfully assigned irq0/bomb-28437-15 to minikube                                                                                                                                 
  Normal   SuccessfulAttachVolume  8m51s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b0a0360e-3d5e-4def-b298-572765b64a66"                                                                                                  
  Warning  FailedMount             3m54s (x7 over 8m42s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-b0a0360e-3d5e-4def-b298-572765b64a66" : rpc error: code = Internal desc = rbd image replicapool/csi-vol-5d357f46-9b75-4c53-b62c-ba0357273f86 found mounted with unexpected encryption status                                                                              
  Warning  FailedMount             2m14s (x3 over 6m48s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[vol], unattached volumes=[vol kube-api-access-mwdl9]: timed out waiting for the condition                                      
  Warning  FailedMount             112s (x4 over 7m59s)   kubelet                  MountVolume.MountDevice failed for volume "pvc-b0a0360e-3d5e-4def-b298-572765b64a66" : rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock: connect: connection refused"  

Image metadata:

[root@rook-ceph-tools-9b86d8888-x4l5q /]# rbd --pool replicapool image-meta list csi-vol-5d357f46-9b75-4c53-b62c-ba0357273f86 
There is 1 metadatum on this image:

Key                         Value
rbd.csi.ceph.com/encrypted       

Rakshith-R added a commit to Rakshith-R/ceph-csi that referenced this issue Oct 11, 2022
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.

Fixes: ceph#3402

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R
Copy link
Contributor

hey @irq0
Can you give a try with ghcr.io/rakshith-r/cephcsi:enc-fix ?
This contains the fix from #3422, it should fix this issue.
I'll try to test on my end too.

@irq0
Copy link
Contributor Author

irq0 commented Oct 12, 2022

With ghcr.io/rakshith-r/cephcsi:enc-fix, I get a whole different error. Any ideas?

  Type     Reason                  Age                From                     Message                                                                                                                                                                                        
  ----     ------                  ----               ----                     -------                                                                                                                                                                                        
  Warning  FailedScheduling        77s (x2 over 93s)  default-scheduler        0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.                                                                                                                   
  Normal   Scheduled               74s                default-scheduler        Successfully assigned irq0/bomb-6723-4 to minikube                                                                                                                                             
  Normal   SuccessfulAttachVolume  73s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-de699a0c-1c7a-468e-ab8a-c6a9d71d78d5"                                                                                                            
  Warning  FailedMount             64s                kubelet                  MountVolume.MountDevice failed for volume "pvc-de699a0c1c7a-468e-ab8a-c6a9d71d78d5" : rpc error: code = Unavailable desc = error reading from server: EOF                                     
  Warning  FailedMount             64s                kubelet                  MountVolume.MountDevice failed for volume "pvc-de699a0c-1c7a-468e-ab8a-c6a9d71d78d5" : rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock: connect: connection refused"                                                                                                                                                                                              
  Warning  FailedMount             62s                kubelet                  MountVolume.MountDevice failed for volume "pvc-de699a0c-1c7a-468e-ab8a-c6a9d71d78d5" : rpc error: code = Internal desc = an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd0 luks-rbd-0001-0009-rook-ceph-0000000000000001-682047ba-864b-4948-ad63-3d746c07adbf --disable-keyring -d /dev/stdin]                                                                                                                                            
  Warning  FailedMount             54s (x2 over 59s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-de699a0c-1c7a-468e-ab8a-c6a9d71d78d5" : rpc error: code = Internal desc = an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd2 luks-rbd-0001-0009-rook-ceph-0000000000000001-682047ba-864b-4948-ad63-3d746c07adbf --disable-keyring -d /dev/stdin]                                                                                                                                            
  Warning  FailedMount             29s (x2 over 45s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-de699a0c-1c7a-468e-ab8a-c6a9d71d78d5" : rpc error: code = Internal desc = an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd1 luks-rbd-0001-0009-rook-ceph-0000000000000001-682047ba-864b-4948-ad63-3d746c07adbf --disable-keyring -d /dev/stdin]                                                                                                                                        

Image metadata:

[root@rook-ceph-tools-9b86d8888-jb552 /]# rbd --pool replicapool info csi-vol-43943e9c-873a-4a45-b5e2-b39c3a8a407f
rbd image 'csi-vol-43943e9c-873a-4a45-b5e2-b39c3a8a407f':
        size 100 MiB in 25 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 1215c34f5c41
        block_name_prefix: rbd_data.1215c34f5c41
        format: 2
        features: layering
        op_features: 
        flags: 
        create_timestamp: Wed Oct 12 14:17:47 2022
        access_timestamp: Wed Oct 12 14:17:47 2022
        modify_timestamp: Wed Oct 12 14:17:47 2022
[root@rook-ceph-tools-9b86d8888-jb552 /]# rbd --pool replicapool image-meta list csi-vol-43943e9c-873a-4a45-b5e2-b39c3a8a407f
There are 2 metadata on this image:

Key                         Value                                                                                            
rbd.csi.ceph.com/dek        {"dek":"Z+9qjIMF23cARFyIWS9GMytgUbTe49WGqiEAv6vhmVGPT+bMa5rGOjfIAKM=","nonce":"faSEpaDyci/9EGw/"}
rbd.csi.ceph.com/encrypted  encrypted  

Didn't check the image in depth, but it does seem to at least have the LUKS magic set:

[root@rook-ceph-tools-9b86d8888-jb552 /]# rbd --pool replicapool export csi-vol-43943e9c-873a-4a45-b5e2-b39c3a8a407f /tmp/foo
Exporting image: 100% complete...done.
[root@rook-ceph-tools-9b86d8888-jb552 /]# file /tmp/foo 
/tmp/foo: LUKS encrypted file, ver 2 [, , sha256] UUID: 42fc9125-c3a2-4c28-bc3d-edf72870c81f

@Rakshith-R
Copy link
Contributor

@irq0
Can you add csi-rbdplugin provisioner and nodeplugin logs please?

The encrypted pvc seems to provisioned properly with everything in place.

Is that issue occurring on all pvc or a few ?

@irq0
Copy link
Contributor Author

irq0 commented Oct 12, 2022

With csi-rbdplugin having a 256M limit and crashing, 5/5 pvc / pods created concurrently fail

Logs

Provisioner

CreateVolume request:

[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.145578630Z I1012 16:02:18.145495       1 utils.go:195] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 GRPC call: /csi.v1.Controller/CreateVolume
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.145742683Z I1012 16:02:18.145690       1 utils.go:206] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 GRPC request: {"capacity_range":{"required_bytes":104857600},"name":"pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06","csi.storage.k8s.io/pvc/name":"pvc-bomb-21465-2","csi.storage.k8s.io/pvc/namespace":"irq0","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.146037965Z I1012 16:02:18.145989       1 rbd_util.go:1279] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 setting disableInUseChecks: false image features: [layering] mounter: rbd
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.166120835Z I1012 16:02:18.165944       1 omap.go:88] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[]
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.195330252Z I1012 16:02:18.194724       1 omap.go:158] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 set omap keys (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06:55493816-bec7-49cf-8b70-d7ae675d85a7])
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.212844220Z I1012 16:02:18.212630       1 omap.go:158] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 set omap keys (pool="replicapool", namespace="", name="csi.volume.55493816-bec7-49cf-8b70-d7ae675d85a7"): map[csi.imagename:csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 csi.volname:pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 csi.volume.encryptKMS:user-ns-secrets-metadata csi.volume.owner:irq0])
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.212851347Z I1012 16:02:18.212692       1 rbd_journal.go:487] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 generated Volume ID (0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7) and image name (csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7) for request name (pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06)
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.213269608Z I1012 16:02:18.212929       1 rbd_util.go:414] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 rbd: create replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 size 100M (features: [layering]) using mon 10.104.239.91:6789,10.104.239.91:3300
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.213283229Z I1012 16:02:18.212957       1 rbd_util.go:1526] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 setting image options on replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.422434195Z I1012 16:02:18.422366       1 controllerserver.go:739] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 created image replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 backed for request name pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.467124437Z I1012 16:02:18.467027       1 omap.go:158] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 set omap keys (pool="replicapool", namespace="", name="csi.volume.55493816-bec7-49cf-8b70-d7ae675d85a7"): map[csi.imageid:121544cd543c])
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.467414586Z I1012 16:02:18.467155       1 utils.go:212] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 GRPC response: {"volume":{"capacity_bytes":104857600,"volume_context":{"clusterID":"rook-ceph","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7","journalPool":"replicapool","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0"},"volume_id":"0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7"}}
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.467124437Z I1012 16:02:18.467027       1 omap.go:158] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 set omap keys (pool="replicapool", namespace="", name="csi.volume.55493816-bec7-49cf-8b70-d7ae675d85a7"): map[csi.imageid:121544cd543c])
[pod/csi-rbdplugin-provisioner-fc9bf66d9-6k2mv/csi-rbdplugin] 2022-10-12T16:02:18.467414586Z I1012 16:02:18.467155       1 utils.go:212] ID: 134 Req-ID: pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 GRPC response: {"volume":{"capacity_bytes":104857600,"volume_context":{"clusterID":"rook-ceph","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7","journalPool":"replicapool","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0"},"volume_id":"0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7"}}

Nodeplugin

Failing NodeStageVolume

[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.839364535Z I1012 16:02:29.839298   35194 utils.go:195] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 GRPC call: /csi.v1.Node/NodeStageVolume
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.839444777Z I1012 16:02:29.839399   35194 utils.go:206] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"rook-ceph","encrypted":"true","encryptionKMSID":"user-ns-secrets-metadata","encryptionKMSType":"metadata","encryptionType":"block","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7","journalPool":"replicapool","pool":"replicapool","secretName":"cephfs-storage-encryption-secret","secretNamespace":"irq0","storage.kubernetes.io/csiProvisionerIdentity":"1665584247059-8081-rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7"}
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.839608570Z I1012 16:02:29.839572   35194 rbd_util.go:1279] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 setting disableInUseChecks: false image features: [layering] mounter: rbd
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.840557658Z I1012 16:02:29.840506   35194 omap.go:88] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 got omap values: (pool="replicapool", namespace="", name="csi.volume.55493816-bec7-49cf-8b70-d7ae675d85a7"): map[csi.imageid:121544cd543c csi.imagename:csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 csi.volname:pvc-460dabd3-6085-4aa2-8c8a-2881a56d4b06 csi.volume.encryptKMS:user-ns-secrets-metadata csi.volume.owner:irq0]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.867799631Z I1012 16:02:29.867672   35194 rbd_util.go:346] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 checking for ImageFeatures: [layering]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.887883604Z I1012 16:02:29.887737   35194 cephcmds.go:105] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 command succeeded: rbd [device list --format=json --device-type krbd]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.907541409Z I1012 16:02:29.907452   35194 rbd_attach.go:420] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 rbd: map mon 10.104.239.91:6789,10.104.239.91:3300
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.974777102Z I1012 16:02:29.974597   35194 cephcmds.go:105] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 command succeeded: rbd [--id admin -m 10.104.239.91:6789,10.104.239.91:3300 --keyfile=***stripped*** map replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 --device-type krbd --options noudev]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.974820460Z I1012 16:02:29.974637   35194 nodeserver.go:414] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 rbd image: replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 was successfully mapped at /dev/rbd0
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:29.996517553Z I1012 16:02:29.996002   35194 encryption.go:80] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 image replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7 encrypted state metadata reports "encrypted"
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.113408047Z I1012 16:02:30.113229   35194 crypto.go:258] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 "/dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7" is not an active LUKS device (an error (exit status 4) occurred while running cryptsetup args: [status luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7]): 
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.113471733Z I1012 16:02:30.113252   35194 crypto.go:210] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 Opening device "/dev/rbd0" with LUKS on "luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7"
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.127753872Z E1012 16:02:30.127455   35194 crypto.go:213] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 failed to open device "/dev/rbd0" (an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd0 luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 --disable-keyring -d /dev/stdin]): Keyslot open failed.
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.127784411Z E1012 16:02:30.127487   35194 encryption.go:247] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 failed to open device replicapool/csi-vol-55493816-bec7-49cf-8b70-d7ae675d85a7: an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd0 luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 --disable-keyring -d /dev/stdin]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.223302683Z I1012 16:02:30.223114   35194 cephcmds.go:105] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 command succeeded: rbd [unmap /dev/rbd0 --device-type krbd --options noudev]
[pod/csi-rbdplugin-pl5sh/csi-rbdplugin] 2022-10-12T16:02:30.223391575Z E1012 16:02:30.223275   35194 utils.go:210] ID: 1220 Req-ID: 0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running cryptsetup args: [luksOpen /dev/rbd0 luks-rbd-0001-0009-rook-ceph-0000000000000001-55493816-bec7-49cf-8b70-d7ae675d85a7 --disable-keyring -d /dev/stdin]

@Rakshith-R
Copy link
Contributor

@irq0
Is the csi-rbdplugin nodeplugin also facing OOM kill?
Can you try mounting just one of those same PVC at a time or remove the memory limit on nodeplugin ?

@irq0
Copy link
Contributor Author

irq0 commented Oct 13, 2022

The last run had a 1GiB memory limit on the csi-rbdplugin nodeserver. I did not see any OOMs of that container.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 14, 2022

256Mib limit for provisioner should be more than enough for this workload, we need to do profiling and check is there any memory leak in cephcsi when provisioning the volume.

@Madhu-1 Madhu-1 added bug Something isn't working component/rbd Issues related to RBD labels Oct 14, 2022
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 14, 2022

I tested secret based encryption with 3.7.1 i dont see any crash with below limits

Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:     250m
      memory:  256Mi
[🎩︎]mrajanna@fedora rbd $]kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
claim0    Bound    pvc-81cb47af-dfbd-4360-b444-7636e8a2c359   1Gi        RWO            rook-ceph-block   4m48s
claim1    Bound    pvc-c1c20239-db18-4c56-b9da-8745b0046428   1Gi        RWO            rook-ceph-block   4m47s
claim10   Bound    pvc-7d6e66d8-ceea-4e0c-9775-64aa84b1548b   1Gi        RWO            rook-ceph-block   4m46s
claim2    Bound    pvc-3a61744c-2d0e-46c1-9d8c-3b0f5f49574c   1Gi        RWO            rook-ceph-block   4m47s
claim3    Bound    pvc-aa2613fb-4db8-4254-801b-8d9d72e83979   1Gi        RWO            rook-ceph-block   4m47s
claim4    Bound    pvc-00fbe104-1809-4a11-8c39-9e3ceee9d5c9   1Gi        RWO            rook-ceph-block   4m47s
claim5    Bound    pvc-7ad36255-755b-4bdd-a88c-4bbf695e8b69   1Gi        RWO            rook-ceph-block   4m47s
claim6    Bound    pvc-bcd48a37-3d8d-47ce-b780-511020690397   1Gi        RWO            rook-ceph-block   4m47s
claim7    Bound    pvc-1fac5dcf-1668-489a-8799-16630b74e971   1Gi        RWO            rook-ceph-block   4m47s
claim8    Bound    pvc-949006fd-5b47-4e9c-acc1-3808894245f8   1Gi        RWO            rook-ceph-block   4m46s
claim9    Bound    pvc-05ab1ba5-4c8f-41a5-ab09-c042b6089b23   1Gi        RWO            rook-ceph-block   4m46s

but when i tested with metadata type encryption i can see the crash, this confirms we have a memory leak?

@trociny
Copy link

trociny commented Oct 26, 2022

Note, the current issue is not about the memory leak. The memory leak is an issue but I believe it is out of scope of this ticket. It may be reported as a separate ticket if needed. The memory leak is the thing that triggers the crash. And I think it is a bug that if the crash happens in the middle of "create encrypted volume" operation, after restart the volume is reported as properly prepared while actually it is not.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 26, 2022

The memory leak is an issue but I believe it is out of scope of this ticket. It may be reported as a separate ticket if needed. The memory leak is the thing that triggers the crash. And I think it is a bug that if the crash happens in the middle of "create encrypted volume" operation, after restart the volume is reported as properly pr

Yes agree but both are different issues but memory leak triggered other one. good to track it as a different issue.

Rakshith-R added a commit to Rakshith-R/ceph-csi that referenced this issue Nov 4, 2022
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.

Fixes: ceph#3402

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R
Copy link
Contributor

@irq0 ,
I tested the fix, though OOMKill still occurs but lifting up the mem limit, PVC got bound and I could mount the concerned PVC successfully.
The steps can be seen here.
#3472 (comment)

@mergify mergify bot closed this as completed in #3422 Nov 7, 2022
mergify bot pushed a commit that referenced this issue Nov 7, 2022
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.

Fixes: #3402

Signed-off-by: Rakshith R <rar@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants