Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBD: OOMKills occurs when secret metadata encryption type is used with multiple PVC create request. #3472

Closed
Madhu-1 opened this issue Oct 26, 2022 · 23 comments
Labels
bug Something isn't working component/rbd Issues related to RBD Priority-0 highest priority issue wontfix This will not be worked on

Comments

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Oct 26, 2022

I tested secret based encryption with 3.7.1 i dont see any crash with below limits

Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:     250m
      memory:  256Mi
[🎩︎]mrajanna@fedora rbd $]kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
claim0    Bound    pvc-81cb47af-dfbd-4360-b444-7636e8a2c359   1Gi        RWO            rook-ceph-block   4m48s
claim1    Bound    pvc-c1c20239-db18-4c56-b9da-8745b0046428   1Gi        RWO            rook-ceph-block   4m47s
claim10   Bound    pvc-7d6e66d8-ceea-4e0c-9775-64aa84b1548b   1Gi        RWO            rook-ceph-block   4m46s
claim2    Bound    pvc-3a61744c-2d0e-46c1-9d8c-3b0f5f49574c   1Gi        RWO            rook-ceph-block   4m47s
claim3    Bound    pvc-aa2613fb-4db8-4254-801b-8d9d72e83979   1Gi        RWO            rook-ceph-block   4m47s
claim4    Bound    pvc-00fbe104-1809-4a11-8c39-9e3ceee9d5c9   1Gi        RWO            rook-ceph-block   4m47s
claim5    Bound    pvc-7ad36255-755b-4bdd-a88c-4bbf695e8b69   1Gi        RWO            rook-ceph-block   4m47s
claim6    Bound    pvc-bcd48a37-3d8d-47ce-b780-511020690397   1Gi        RWO            rook-ceph-block   4m47s
claim7    Bound    pvc-1fac5dcf-1668-489a-8799-16630b74e971   1Gi        RWO            rook-ceph-block   4m47s
claim8    Bound    pvc-949006fd-5b47-4e9c-acc1-3808894245f8   1Gi        RWO            rook-ceph-block   4m46s
claim9    Bound    pvc-05ab1ba5-4c8f-41a5-ab09-c042b6089b23   1Gi        RWO            rook-ceph-block   4m46s

but when i tested with metadata type encryption i can see the crash, this confirms we have a memory leak?

Originally posted by @Madhu-1 in #3402 (comment)

@Madhu-1 Madhu-1 added bug Something isn't working component/rbd Issues related to RBD labels Oct 26, 2022
@Rakshith-R
Copy link
Contributor

based on testing with secrets metadata and 128Mi memory limit with #3422 fix
Seems to crash after this log continuously

log.DebugLog(ctx, "found existing volume (%s) with image name (%s) for request (%s)",

Only important thing that occurs after this logline is rbdvol.Destroy(), I think this is causing OOMKill

Lifting up memory limit will lead to all PVCs going to bound state and can be succesfully mounted to a pod.

I1104 10:08:34.816009       1 cephcsi.go:192] Driver version: canary and Git version: e91b849ffcccafc8f374855e2537c38eaf2baeac
I1104 10:08:34.816188       1 cephcsi.go:210] Initial PID limit is set to 14167
I1104 10:08:34.816233       1 cephcsi.go:216] Reconfigured PID limit to -1 (max)
I1104 10:08:34.816405       1 cephcsi.go:241] Starting driver type: rbd with name: rook-ceph.rbd.csi.ceph.com
I1104 10:08:34.816555       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME
I1104 10:08:34.816562       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1104 10:08:34.816565       1 driver.go:94] Enabling controller service capability: CLONE_VOLUME
I1104 10:08:34.816567       1 driver.go:94] Enabling controller service capability: EXPAND_VOLUME
I1104 10:08:34.816570       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_WRITER
I1104 10:08:34.816577       1 driver.go:107] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I1104 10:08:34.816580       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I1104 10:08:34.816582       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
W1104 10:08:34.816589       1 driver.go:162] replication service running on controller server is deprecated and replaced by CSI-Addons, see https://github.com/ceph/ceph-csi/issues/3314 for more details
I1104 10:08:34.816675       1 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi-provisioner.sock", Net:"unix"}
I1104 10:08:34.816679       1 server.go:114] listening for CSI-Addons requests on address: &net.UnixAddr{Name:"/csi/csi-addons.sock", Net:"unix"}
I1104 10:08:35.450896       1 utils.go:195] ID: 1 GRPC call: /csi.v1.Identity/Probe
I1104 10:08:35.451802       1 utils.go:206] ID: 1 GRPC request: {}
I1104 10:08:35.451921       1 utils.go:212] ID: 1 GRPC response: {}
I1104 10:08:35.452453       1 utils.go:195] ID: 2 GRPC call: /csi.v1.Identity/GetPluginInfo
I1104 10:08:35.452476       1 utils.go:206] ID: 2 GRPC request: {}
I1104 10:08:35.452484       1 identityserver-default.go:39] ID: 2 Using default GetPluginInfo
I1104 10:08:35.452546       1 utils.go:212] ID: 2 GRPC response: {"name":"rook-ceph.rbd.csi.ceph.com","vendor_version":"canary"}
I1104 10:08:35.452814       1 utils.go:195] ID: 3 GRPC call: /csi.v1.Identity/GetPluginCapabilities
I1104 10:08:35.452844       1 utils.go:206] ID: 3 GRPC request: {}
I1104 10:08:35.453035       1 utils.go:212] ID: 3 GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"VolumeExpansion":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I1104 10:08:35.453540       1 utils.go:195] ID: 4 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1104 10:08:35.453571       1 utils.go:206] ID: 4 GRPC request: {}
I1104 10:08:35.453579       1 controllerserver-default.go:72] ID: 4 Using default ControllerGetCapabilities
I1104 10:08:35.453675       1 utils.go:212] ID: 4 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1104 10:10:57.438805       1 utils.go:195] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac GRPC call: /csi.v1.Controller/CreateVolume
I1104 10:10:57.439115       1 utils.go:195] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 GRPC call: /csi.v1.Controller/CreateVolume
I1104 10:10:57.440471       1 utils.go:206] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac","csi.storage.k8s.io/pvc/name":"rbd-pvc-2","csi.storage.k8s.io/pvc/namespace":"rook-ceph","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I1104 10:10:57.441397       1 rbd_util.go:1279] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac setting disableInUseChecks: false image features: [layering] mounter: rbd
I1104 10:10:57.445250       1 utils.go:206] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4","csi.storage.k8s.io/pvc/name":"rbd-pvc-1","csi.storage.k8s.io/pvc/namespace":"rook-ceph","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I1104 10:10:57.448371       1 utils.go:195] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 GRPC call: /csi.v1.Controller/CreateVolume
I1104 10:10:57.511231       1 rbd_util.go:1279] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1104 10:10:57.511792       1 utils.go:206] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2","csi.storage.k8s.io/pvc/name":"rbd-pvc-5","csi.storage.k8s.io/pvc/namespace":"rook-ceph","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I1104 10:10:57.512114       1 rbd_util.go:1279] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1104 10:10:57.450004       1 utils.go:195] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 GRPC call: /csi.v1.Controller/CreateVolume
I1104 10:10:57.513912       1 utils.go:206] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0","csi.storage.k8s.io/pvc/name":"rbd-pvc-4","csi.storage.k8s.io/pvc/namespace":"rook-ceph","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I1104 10:10:57.450170       1 utils.go:195] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 GRPC call: /csi.v1.Controller/CreateVolume
I1104 10:10:57.514431       1 utils.go:206] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-179792d9-c723-4106-b3d1-b0532f1f96f1","parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/pv/name":"pvc-179792d9-c723-4106-b3d1-b0532f1f96f1","csi.storage.k8s.io/pvc/name":"rbd-pvc-3","csi.storage.k8s.io/pvc/namespace":"rook-ceph","encrypted":"true","encryptionKMSID":"user-secrets-metadata-test","imageFeatures":"layering","imageFormat":"2","pool":"replicapool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I1104 10:10:57.514731       1 rbd_util.go:1279] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1104 10:10:57.515382       1 rbd_util.go:1279] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1104 10:10:58.512480       1 omap.go:88] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2:a6f79f2d-e921-49e0-ab3d-d3f8a99ec978]
I1104 10:10:58.515450       1 omap.go:88] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-179792d9-c723-4106-b3d1-b0532f1f96f1:84db196d-ab12-4b53-a3e1-36e9e9dc4429]
I1104 10:10:58.515609       1 omap.go:88] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 got omap values: (pool="replicapool", namespace="", name="csi.volume.a6f79f2d-e921-49e0-ab3d-d3f8a99ec978"): map[csi.imagename:csi-vol-a6f79f2d-e921-49e0-ab3d-d3f8a99ec978 csi.volname:pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.owner:rook-ceph]
I1104 10:10:58.515776       1 omap.go:88] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac:28966dbc-5810-43b7-8381-a5f12a6a12fd]
I1104 10:10:58.517980       1 omap.go:88] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 got omap values: (pool="replicapool", namespace="", name="csi.volume.84db196d-ab12-4b53-a3e1-36e9e9dc4429"): map[csi.imagename:csi-vol-84db196d-ab12-4b53-a3e1-36e9e9dc4429 csi.volname:pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.owner:rook-ceph]
I1104 10:10:58.519165       1 omap.go:88] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac got omap values: (pool="replicapool", namespace="", name="csi.volume.28966dbc-5810-43b7-8381-a5f12a6a12fd"): map[csi.imagename:csi-vol-28966dbc-5810-43b7-8381-a5f12a6a12fd csi.volname:pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.owner:rook-ceph]
I1104 10:10:58.521710       1 omap.go:88] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4:790e02db-a332-4e10-8db6-25debf9dcd5c]
I1104 10:10:58.523374       1 omap.go:88] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 got omap values: (pool="replicapool", namespace="", name="csi.volume.790e02db-a332-4e10-8db6-25debf9dcd5c"): map[csi.imagename:csi-vol-790e02db-a332-4e10-8db6-25debf9dcd5c csi.volname:pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.owner:rook-ceph]
I1104 10:10:58.611162       1 omap.go:88] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 got omap values: (pool="replicapool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0:23f9efa6-adfb-49b0-be4c-b8bc8acb5658]
I1104 10:10:58.611729       1 omap.go:88] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 got omap values: (pool="replicapool", namespace="", name="csi.volume.23f9efa6-adfb-49b0-be4c-b8bc8acb5658"): map[csi.imagename:csi-vol-23f9efa6-adfb-49b0-be4c-b8bc8acb5658 csi.volname:pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 csi.volume.encryptKMS:user-secrets-metadata-test csi.volume.owner:rook-ceph]
I1104 10:10:58.737545       1 omap.go:158] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 set omap keys (pool="replicapool", namespace="", name="csi.volume.a6f79f2d-e921-49e0-ab3d-d3f8a99ec978"): map[csi.imageid:11a6728409fa])
I1104 10:10:58.737545       1 omap.go:158] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac set omap keys (pool="replicapool", namespace="", name="csi.volume.28966dbc-5810-43b7-8381-a5f12a6a12fd"): map[csi.imageid:11a6a8672781])
I1104 10:10:58.737611       1 rbd_journal.go:337] ID: 7 Req-ID: pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2 found existing volume (0001-0009-rook-ceph-0000000000000001-a6f79f2d-e921-49e0-ab3d-d3f8a99ec978) with image name (csi-vol-a6f79f2d-e921-49e0-ab3d-d3f8a99ec978) for request (pvc-48c2c85c-a826-4755-9b9e-0e75df34c1f2)
I1104 10:10:58.737611       1 rbd_journal.go:337] ID: 5 Req-ID: pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac found existing volume (0001-0009-rook-ceph-0000000000000001-28966dbc-5810-43b7-8381-a5f12a6a12fd) with image name (csi-vol-28966dbc-5810-43b7-8381-a5f12a6a12fd) for request (pvc-3416c8ae-6eb7-484c-8824-272c5cea7bac)
I1104 10:10:58.811046       1 omap.go:158] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 set omap keys (pool="replicapool", namespace="", name="csi.volume.84db196d-ab12-4b53-a3e1-36e9e9dc4429"): map[csi.imageid:11a6b1278881])
I1104 10:10:58.811481       1 rbd_journal.go:337] ID: 9 Req-ID: pvc-179792d9-c723-4106-b3d1-b0532f1f96f1 found existing volume (0001-0009-rook-ceph-0000000000000001-84db196d-ab12-4b53-a3e1-36e9e9dc4429) with image name (csi-vol-84db196d-ab12-4b53-a3e1-36e9e9dc4429) for request (pvc-179792d9-c723-4106-b3d1-b0532f1f96f1)
I1104 10:10:58.910697       1 omap.go:158] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 set omap keys (pool="replicapool", namespace="", name="csi.volume.790e02db-a332-4e10-8db6-25debf9dcd5c"): map[csi.imageid:11a64b6bbbc6])
I1104 10:10:58.910804       1 rbd_journal.go:337] ID: 6 Req-ID: pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4 found existing volume (0001-0009-rook-ceph-0000000000000001-790e02db-a332-4e10-8db6-25debf9dcd5c) with image name (csi-vol-790e02db-a332-4e10-8db6-25debf9dcd5c) for request (pvc-21bab1d6-2e7e-4ec5-a6f1-ab5504487ee4)
I1104 10:10:59.010592       1 omap.go:158] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 set omap keys (pool="replicapool", namespace="", name="csi.volume.23f9efa6-adfb-49b0-be4c-b8bc8acb5658"): map[csi.imageid:11a6e8058d08])
I1104 10:10:59.010793       1 rbd_journal.go:337] ID: 8 Req-ID: pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0 found existing volume (0001-0009-rook-ceph-0000000000000001-23f9efa6-adfb-49b0-be4c-b8bc8acb5658) with image name (csi-vol-23f9efa6-adfb-49b0-be4c-b8bc8acb5658) for request (pvc-e76394a0-409e-46b4-b1cd-b8eb676c54f0)

@Rakshith-R
Copy link
Contributor

Edit

Only important thing that occurs after this logline is rbdvol.Destroy(), I think this is causing OOMKill

My fix will call setupencryption() again in repairvol step. This is causing looping OOMKill.

@Rakshith-R
Copy link
Contributor

OOMKill is due to this function

key, err := scrypt.Key([]byte(passphrase), []byte(salt), 32768, 8, 1, 32)

https://pkg.go.dev/golang.org/x/crypto@v0.1.0/scrypt#Key

This issue is isolated to secrets-metadata kms alone.

@Rakshith-R Rakshith-R changed the title RBD: Test memory leak when metadata type encryption is used RBD: OOMKills occurs when secret metadata encryption type is used with multiple PVC create request. Nov 9, 2022
@Madhu-1 Madhu-1 added the Priority-0 highest priority issue label Nov 9, 2022
@irq0
Copy link
Contributor

irq0 commented Dec 12, 2022

scrypt.Key

Interesting!

There have been some reports in the go crypto package issue tracker about scrypt memory:
golang/go#20000
golang/go#40687
golang/go#7168

Is this perhaps not a leak, but the GC not kicking in soon enough?

@lentzi90
Copy link
Contributor

lentzi90 commented Jan 4, 2023

I have been looking into this a bit. Sharing my findings here hoping that they will be useful.

  1. Confirmed that scrypt.Key is responsible for allocating most of the memory.
  2. Why does it need so much? This line alone is responsible for 8MB if I'm not mistaken:
    v := make([]uint32, 32*N*r)

    With the parameters we pass we get 32×32768×8 = 8388608 or 8 MB. When serving multiple requests at the same time, this quickly adds up!
  3. I have checked the memory volumes also, but they don't play a significant part in this from what I can tell. I initially suspected that we may be writing something large there.
  4. Is GC slow? I'm not sure. If there are parallel requests I don't think there is any reasonable chance for it really. But I saw something strange actually: The memory usage drops suddenly about 4-5 minutes after creating the PVCs. Any clue what could be causing that?
  5. Perhaps related to 4, there is an issue (kubelet counts active page cache against memory.available (maybe it shouldn't?) kubernetes/kubernetes#43916) about active page cache counted as "used memory". Not sure if it has anything to do with this though.

I'm not sure how to interpret these really, but here is the memory.stat of the container right before it was OOM killed (memory limit set to 128 MiB):

cache 1486848                                                                                                                                                                                              
rss 106082304                                                                                                                                                                                              
rss_huge 69206016                                                                                                                                                                                          
shmem 0                                                                                                                                                                                                    
mapped_file 811008                                                                                                                                                                                         
dirty 0                                                                                                                                                                                                    
writeback 0                                                                                                                                                                                                
swap 0                                                                                                                                                                                                     
pgpgin 24156                                                                                                                                                                                               
pgpgout 11364                                                                                                                                                                                              
pgfault 24915                                                                                                                                                                                              
pgmajfault 0                                                                                                                                                                                               
inactive_anon 106618880
active_anon 0     
inactive_file 868352
active_file 405504
unevictable 0            
hierarchical_memory_limit 134217728
hierarchical_memsw_limit 134217728
total_cache 1486848
total_rss 106082304
total_rss_huge 69206016
total_shmem 0      
total_mapped_file 811008
total_dirty 0               
total_writeback 0  
total_swap 0               
total_pgpgin 24156      
total_pgpgout 11364
total_pgfault 24915                                                                                                                                                                                        
total_pgmajfault 0
total_inactive_anon 106618880
total_active_anon 0
total_inactive_file 868352
total_active_file 405504
total_unevictable 0

And this is after it restarted:

cache 5136384
rss 23924736
rss_huge 0
shmem 0
mapped_file 2973696
dirty 0
writeback 0
swap 0
pgpgin 9471
pgpgout 2426
pgfault 10395
pgmajfault 33
inactive_anon 23916544
active_anon 0
inactive_file 4026368
active_file 942080
unevictable 0
hierarchical_memory_limit 134217728
hierarchical_memsw_limit 134217728
total_cache 5136384
total_rss 23924736
total_rss_huge 0
total_shmem 0
total_mapped_file 2973696
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 9471
total_pgpgout 2426
total_pgfault 10395
total_pgmajfault 33
total_inactive_anon 23916544
total_active_anon 0
total_inactive_file 4026368
total_active_file 942080
total_unevictable 0

One thing that caught my eye is the rss_huge.
Anything else I could check here to help?

@lentzi90
Copy link
Contributor

lentzi90 commented Jan 9, 2023

Some more experiments also hints that the issue is the parallel scrypt.Key calls. With a memory limit of 256MiB, I can create 100 PVCs without issue as long as I do it one at a time with 1 sec sleep between. On the other hand, if I create 5-10 PVCs in one go I immediately get OOM.

Any ideas for how to solve this? I'm not familiar with the code so unsure where to start looking. 🙁

@nixpanic
Copy link
Member

nixpanic commented Jan 9, 2023 via email

@lentzi90
Copy link
Contributor

Is anyone planning to work on this?
I was looking at it briefly last week, but this code is new to me and not immediately obvious how I would solve the issue. Then I saw that it is marked as highest priority and thought I might as well wait until someone that knows the code takes a look 😄

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 18, 2023

@lentzi90 please feel free to take it

@Rakshith-R
Copy link
Contributor

Is anyone planning to work on this? I was looking at it briefly last week, but this code is new to me and not immediately obvious how I would solve the issue. Then I saw that it is marked as highest priority and thought I might as well wait until someone that knows the code takes a look smile

Currently no one is working on this, you can take this up.

As @nixpanic suggested we can surround the call to scrypt.Key with a semaphore to avoid this issue.

@lentzi90
Copy link
Contributor

I picked this up today. Added a semaphore as suggested, proceeded to test it. Everything was looking good. With 256Mi limit I had no issues creating 100 PVCs. Then I went on to check that the issue still existed when using the upstream canary image and... had difficulty reproducing the issue 😮

It seems to me that it was actually solved already. I checked the commit log and found this one that could explain it: 36fe145
From the release notes:

ringhash: impose cap on max_ring_size to reduce possibility of OOMs (grpc/grpc-go#5801)

Can someone else confirm?

@nixpanic
Copy link
Member

nixpanic commented Jan 23, 2023 via email

@lentzi90
Copy link
Contributor

As creating the encryption keys is also very CPU intensive, having a semaphore that limits the amount of concurrent requests would still be a nice thing to have.

I would say this should be a separate issue and perhaps not best solved with a semaphore. What about using CPU limits? OOM was a big issue since the container would not recover from it and just go OOM again. But I didn't see any issues with CPU usage in my tests (all could easily be tested on a laptop, it stayed responsive throughout), so unless there is some concrete example of how and why a semaphore is needed for that, I would avoid it.

It would be nice to get confirmation from someone else on the OOM issue though before closing this.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Feb 22, 2023
@paolo-depa
Copy link

I would say this should be a separate issue and perhaps not best solved with a semaphore. What about using CPU limits?

I had hard times in figuring out a proper CPU limit and finally solved the OOM issue making my pod activities heavily CPU bounded, which helped with OOMs but slowed all the other operations.
A CPU throttling mechanism restricted to the most CPU intensive activities would be really appreciated...

@trociny
Copy link

trociny commented Feb 23, 2023

I picked this up today. Added a semaphore as suggested, proceeded to test it. Everything was looking good. With 256Mi limit I had no issues creating 100 PVCs. Then I went on to check that the issue still existed when using the upstream canary image and... had difficulty reproducing the issue open_mouth

It seems to me that it was actually solved already. I checked the commit log and found this one that could explain it: 36fe145 From the release notes:

ringhash: impose cap on max_ring_size to reduce possibility of OOMs (grpc/grpc-go#5801)

Although this change may improve it, I think the idea to limit the number of concurrent requests is still valid. Creating 100 PVS simultaneously will spawn 100 cryptsetup processes, and the pod memory utilization may reach the limit.

@lentzi90
Copy link
Contributor

@paolo-depa what version are you using? The original issue was reported for v3.7.1. Now v3.8.0 is out and from my tests before I would expect it to be fixed there.

@paolo-depa
Copy link

paolo-depa commented Feb 23, 2023

@paolo-depa what version are you using? The original issue was reported for v3.7.1. Now v3.8.0 is out and from my tests before I would expect it to be fixed there.

Just did some tests creating 100 pods using the v.3.8.0:

  1. Limiting cpu to 150m and mem to 1Gb, I saw 2 OOMkill events - one of those making the container respawn - but still had 100% of pods active within 40 mins
  2. Raising cpu limit to 1 and leaving mem untouched, I saw 30 OOMkill events - one of those making the container respawn - ending up in 87 working / 13 non working pods in 15 minutes

Adding a snippet from the message log of one of those events:

2023-02-23T16:48:06.349522+01:00 germ39 kernel: [ 5733.862631] cryptsetup invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-997
2023-02-23T16:48:06.349544+01:00 germ39 kernel: [ 5733.862636] CPU: 1 PID: 27451 Comm: cryptsetup Kdump: loaded Not tainted 5.3.18-150300.59.93-default #1 SLE15-SP3
2023-02-23T16:48:06.349545+01:00 germ39 kernel: [ 5733.862636] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
2023-02-23T16:48:06.349546+01:00 germ39 kernel: [ 5733.862637] Call Trace:
2023-02-23T16:48:06.349548+01:00 germ39 kernel: [ 5733.862646] dump_stack+0x66/0x8b
2023-02-23T16:48:06.349549+01:00 germ39 kernel: [ 5733.862650] dump_header+0x4a/0x210
2023-02-23T16:48:06.349549+01:00 germ39 kernel: [ 5733.862652] oom_kill_process+0xec/0x130
2023-02-23T16:48:06.349570+01:00 germ39 kernel: [ 5733.862654] out_of_memory+0x105/0x510
2023-02-23T16:48:06.349571+01:00 germ39 kernel: [ 5733.862657] mem_cgroup_out_of_memory+0xb9/0xd0
2023-02-23T16:48:06.349572+01:00 germ39 kernel: [ 5733.862659] try_charge+0x7c4/0x800
2023-02-23T16:48:06.349573+01:00 germ39 kernel: [ 5733.862662] ? __alloc_pages_nodemask+0x15d/0x320
2023-02-23T16:48:06.349573+01:00 germ39 kernel: [ 5733.862663] mem_cgroup_try_charge+0x70/0x190
2023-02-23T16:48:06.349574+01:00 germ39 kernel: [ 5733.862665] mem_cgroup_try_charge_delay+0x1c/0x40
2023-02-23T16:48:06.349575+01:00 germ39 kernel: [ 5733.862668] __handle_mm_fault+0x96d/0x1260
2023-02-23T16:48:06.349576+01:00 germ39 kernel: [ 5733.862670] handle_mm_fault+0xc4/0x210
2023-02-23T16:48:06.349577+01:00 germ39 kernel: [ 5733.862671] __get_user_pages+0x1f7/0x780
2023-02-23T16:48:06.349578+01:00 germ39 kernel: [ 5733.862673] populate_vma_page_range+0x6d/0x80
2023-02-23T16:48:06.349578+01:00 germ39 kernel: [ 5733.862674] __mm_populate+0xa9/0x150
2023-02-23T16:48:06.349579+01:00 germ39 kernel: [ 5733.862676] vm_mmap_pgoff+0x113/0x130
2023-02-23T16:48:06.349579+01:00 germ39 kernel: [ 5733.862678] ksys_mmap_pgoff+0xfb/0x220
2023-02-23T16:48:06.349580+01:00 germ39 kernel: [ 5733.862681] do_syscall_64+0x5b/0x1e0
2023-02-23T16:48:06.349581+01:00 germ39 kernel: [ 5733.862684] entry_SYSCALL_64_after_hwframe+0x61/0xc6
2023-02-23T16:48:06.349582+01:00 germ39 kernel: [ 5733.862687] RIP: 0033:0x7efd51456957
2023-02-23T16:48:06.349583+01:00 germ39 kernel: [ 5733.862689] Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74 52 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 79 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f
2023-02-23T16:48:06.349583+01:00 germ39 kernel: [ 5733.862689] RSP: 002b:00007fffa47e7958 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
2023-02-23T16:48:06.349584+01:00 germ39 kernel: [ 5733.862691] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efd51456957
2023-02-23T16:48:06.349586+01:00 germ39 kernel: [ 5733.862691] RDX: 0000000000000003 RSI: 00000000227c1000 RDI: 0000000000000000
2023-02-23T16:48:06.349587+01:00 germ39 kernel: [ 5733.862692] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
2023-02-23T16:48:06.349587+01:00 germ39 kernel: [ 5733.862692] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
2023-02-23T16:48:06.349588+01:00 germ39 kernel: [ 5733.862693] R13: 00000000227c1000 R14: 0000000000000022 R15: 00000000ffffffff
2023-02-23T16:48:06.349589+01:00 germ39 kernel: [ 5733.862695] memory: usage 1048576kB, limit 1048576kB, failcnt 206296
2023-02-23T16:48:06.349590+01:00 germ39 kernel: [ 5733.862695] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
2023-02-23T16:48:06.349591+01:00 germ39 kernel: [ 5733.862696] kmem: usage 20064kB, limit 9007199254740988kB, failcnt 0
2023-02-23T16:48:06.349592+01:00 germ39 kernel: [ 5733.862696] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd17563cd_9da0_4d86_a89a_a4cb0f8070b3.slice/cri-containerd-f5ec1120df909b8125ebf2a5c2172e38fa978122c021c4e82afaa77cf5595f99.scope:
2023-02-23T16:48:06.349593+01:00 germ39 kernel: [ 5733.862708] anon 1052880896
2023-02-23T16:48:06.349594+01:00 germ39 kernel: [ 5733.862708] file 147456
2023-02-23T16:48:06.349594+01:00 germ39 kernel: [ 5733.862708] kernel_stack 1253376
2023-02-23T16:48:06.349595+01:00 germ39 kernel: [ 5733.862708] slab 11677696
2023-02-23T16:48:06.349595+01:00 germ39 kernel: [ 5733.862708] sock 0
2023-02-23T16:48:06.349596+01:00 germ39 kernel: [ 5733.862708] shmem 0
2023-02-23T16:48:06.349597+01:00 germ39 kernel: [ 5733.862708] file_mapped 0
2023-02-23T16:48:06.349597+01:00 germ39 kernel: [ 5733.862708] file_dirty 135168
2023-02-23T16:48:06.349598+01:00 germ39 kernel: [ 5733.862708] file_writeback 270336
2023-02-23T16:48:06.349599+01:00 germ39 kernel: [ 5733.862708] anon_thp 557842432
2023-02-23T16:48:06.349599+01:00 germ39 kernel: [ 5733.862708] inactive_anon 49152
2023-02-23T16:48:06.349600+01:00 germ39 kernel: [ 5733.862708] active_anon 208896
2023-02-23T16:48:06.349601+01:00 germ39 kernel: [ 5733.862708] inactive_file 245760
2023-02-23T16:48:06.349601+01:00 germ39 kernel: [ 5733.862708] active_file 319488
2023-02-23T16:48:06.349602+01:00 germ39 kernel: [ 5733.862708] unevictable 1052946432
2023-02-23T16:48:06.349603+01:00 germ39 kernel: [ 5733.862708] slab_reclaimable 6782976
2023-02-23T16:48:06.349603+01:00 germ39 kernel: [ 5733.862708] slab_unreclaimable 4894720
2023-02-23T16:48:06.349604+01:00 germ39 kernel: [ 5733.862708] pgfault 1986534
2023-02-23T16:48:06.349604+01:00 germ39 kernel: [ 5733.862708] pgmajfault 56991
2023-02-23T16:48:06.349605+01:00 germ39 kernel: [ 5733.862708] workingset_refault 1815
2023-02-23T16:48:06.349605+01:00 germ39 kernel: [ 5733.862708] workingset_activate 924
2023-02-23T16:48:06.349606+01:00 germ39 kernel: [ 5733.862708] workingset_nodereclaim 0
2023-02-23T16:48:06.349607+01:00 germ39 kernel: [ 5733.862708] pgrefill 1219243
2023-02-23T16:48:06.349607+01:00 germ39 kernel: [ 5733.862708] pgscan 2018920
2023-02-23T16:48:06.349608+01:00 germ39 kernel: [ 5733.862708] pgsteal 835476
2023-02-23T16:48:06.349609+01:00 germ39 kernel: [ 5733.862708] pgactivate 132264
2023-02-23T16:48:06.349610+01:00 germ39 kernel: [ 5733.862708] pgdeactivate 1219243
2023-02-23T16:48:06.349611+01:00 germ39 kernel: [ 5733.862708] pglazyfree 0
2023-02-23T16:48:06.349611+01:00 germ39 kernel: [ 5733.862708] Tasks state (memory values in pages):
2023-02-23T16:48:06.349612+01:00 germ39 kernel: [ 5733.862709] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
2023-02-23T16:48:06.349613+01:00 germ39 kernel: [ 5733.862712] [ 3868] 0 3868 1228537 15026 3878912 400476 -997 cephcsi
2023-02-23T16:48:06.349613+01:00 germ39 kernel: [ 5733.862714] [ 23073] 0 23073 52104 24466 339968 0 -997 cryptsetup
2023-02-23T16:48:06.349614+01:00 germ39 kernel: [ 5733.862715] [ 23854] 0 23854 53748 26083 352256 0 -997 cryptsetup
2023-02-23T16:48:06.349614+01:00 germ39 kernel: [ 5733.862716] [ 25520] 0 25520 53768 26129 344064 0 -997 cryptsetup
2023-02-23T16:48:06.349615+01:00 germ39 kernel: [ 5733.862717] [ 25521] 0 25521 45548 17914 286720 0 -997 cryptsetup
2023-02-23T16:48:06.349616+01:00 germ39 kernel: [ 5733.862719] [ 26646] 0 26646 42008 14337 253952 0 -997 cryptsetup
2023-02-23T16:48:06.349617+01:00 germ39 kernel: [ 5733.862720] [ 27034] 0 27034 39031 11379 233472 0 -997 cryptsetup
2023-02-23T16:48:06.349617+01:00 germ39 kernel: [ 5733.862721] [ 27451] 0 27451 180188 138685 1253376 0 -997 cryptsetup
2023-02-23T16:48:06.349618+01:00 germ39 kernel: [ 5733.862722] [ 28073] 0 28073 38785 11166 229376 0 -997 cryptsetup
2023-02-23T16:48:06.349619+01:00 germ39 kernel: [ 5733.862724] [ 30712] 0 30712 38785 11175 237568 0 -997 cryptsetup
2023-02-23T16:48:06.349619+01:00 germ39 kernel: [ 5733.862725] [ 31927] 0 31927 25860 2780 212992 501 -997 rbd
2023-02-23T16:48:06.349620+01:00 germ39 kernel: [ 5733.862727] [ 32409] 0 32409 2230 262 65536 37 -997 blkid
2023-02-23T16:48:06.349620+01:00 germ39 kernel: [ 5733.862728] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-f5ec1120df909b8125ebf2a5c2172e38fa978122c021c4e82afaa77cf5595f99.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd17563cd_9da0_4d86_a89a_a4cb0f8070b3.slice/cri-containerd-f5ec1120df909b8125ebf2a5c2172e38fa978122c021c4e82afaa77cf5595f99.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd17563cd_9da0_4d86_a89a_a4cb0f8070b3.slice/cri-containerd-f5ec1120df909b8125ebf2a5c2172e38fa978122c021c4e82afaa77cf5595f99.scope,task=blkid,pid=32409,uid=0
2023-02-23T16:48:06.349621+01:00 germ39 kernel: [ 5733.862736] Memory cgroup out of memory: Killed process 32409 (blkid) total-vm:8920kB, anon-rss:0kB, file-rss:1048kB, shmem-rss:0kB

@github-actions github-actions bot removed the wontfix This will not be worked on label Feb 23, 2023
@Rakshith-R
Copy link
Contributor

This OOMKill happens in the csi-rbdplugin nodeplugin pod right ?
the nodeplugin calls cryptsetup.

This issue was originally to address OOMKills in provisioner pod during volume creation. That seems to be resolved now.

I'd expect the CO in this case kublet or the one responsible for issuing nodepublish/stage calls to have some kind of
limit for simultaneous csi calls.

Similar to the one for csi-provsioner https://github.com/kubernetes-csi/external-provisioner/blob/cd81ed5d31835d1aabad74db869c1165df9e3666/cmd/csi-provisioner/csi-provisioner.go#L81

@lentzi90
Copy link
Contributor

I pushed the patch I was working on and created a draft PR: #3693
Feel free to take it over, use it for testing or what you want. As @Rakshith-R mentioned, I think the original issue is already resolved so it probably doesn't make sense to merge this PR, but maybe it can help as inspiration for other issues.

shaas added a commit to shaas/ceph-csi that referenced this issue Mar 8, 2023
…h multiple PVC create request. ceph#3472

Signed-off-by: Stefan Haas <shaas@suse.com>
@paolo-depa
Copy link

Running some tests having applied the PR: #3693 on top of the v3.8.0, results show high resilience even to the OOMkills occurring in the rbdplugin: having released any memory and cpu limitationson the pods, I could create flawlessly 100 PVCs.

+1 to have it merged!

@github-actions
Copy link

github-actions bot commented Apr 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Apr 9, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2023
mgfritch pushed a commit to mgfritch/ceph-csi that referenced this issue Feb 26, 2024
generateCipher is memory heavy, so to avoid OOM situations, a semaphore
is added to limit concurrency here.

Fixes: ceph#3472
Signed-off-by: Michael Fritch <mfritch@suse.com>
mgfritch added a commit to mgfritch/ceph-csi that referenced this issue Feb 26, 2024
generateCipher is memory heavy, so to avoid OOM situations, a semaphore
is added to limit concurrency here.

Fixes: ceph#3472
Signed-off-by: Michael Fritch <mfritch@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD Priority-0 highest priority issue wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants