Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster-mapping during failover not applied #4493

Closed
kmadac opened this issue Mar 13, 2024 · 4 comments · May be fixed by #4501
Closed

Cluster-mapping during failover not applied #4493

kmadac opened this issue Mar 13, 2024 · 4 comments · May be fixed by #4501

Comments

@kmadac
Copy link

kmadac commented Mar 13, 2024

Describe the bug

I have an issue with cluster-mapping and with using mirrored RBD volumes by ceph-csi in case of disaster?

In the test environment I'm trying to use mirrored rbd volumes on k8s with ceph-csi. I have a cluster-mapping.json in place where primary pool and ceph id is mapped to secondary ceph. I have also config.json with list of mons for both cephs. The issue is that during failover to secondary site, when I manually create PV/PVC the same way as was on primary side, cluster-mapping is not applied during NodeStageVolume (at least what I can see in the code) and ceph-csi still tries to access inaccessible primary cluster, which is unsucessfull and indefinitely stucks application pods in ContainerCreting phase. When I manualy create PV on secondary site with correct volumeHandle, then it works. Why the cluster-mapping.json is needed then if volumeHandle still needs to be manually changed in case of failover? Shouldn't it be applied also during call of NodeStageVolume?

Environment details

  • Image/version of Ceph CSI driver : canary
  • Helm chart version : manifest deployment
  • Kernel version :
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) : rbd-nbd
  • Kubernetes cluster version : v1.28.7+k3s1
  • Ceph cluster version : 17.2.6

Steps to reproduce

  1. Setup details: deploy two ceph cluster. Primary and secondary
  2. Deploy k8s cluster whicha has connectivity to both cephs
  3. Create rbd 'kubernetes' pool on both cephs
  4. Setup rbd-mirror between both cluster for volume 'kubernetes
  5. Deploy ceph-csi on k8s cluster and integrate with primary ceph.
  6. Deploy app 'helm install dokuwiki oci://registry-1.docker.io/bitnamicharts/dokuwiki -n dokuwiki --set global.storageClass=csi-rbd-sc,service.type=NodePort'
  7. Wait till rbd image is synced
  8. Stop k8s cluster
  9. Demote kubernetes pool on primary cluster, promote kubernetes pool on secondary ceph cluster
  10. Start k8s cluster
  11. Change ceph id in storageclass
  12. Delete ceph-csi pods to initiate restart of csi
  13. Delete application pod

csi-config-map

---
apiVersion: v1
kind: ConfigMap
data:
  config.json: |-
    [
      {
        "clusterID": "2459c716-dd81-11ee-a184-525400150bec",
        "monitors": [
          "192.168.121.11:6789",
          "192.168.121.122:6789",
          "192.168.121.97:6789"
        ]
      },
      {
        "clusterID": "9fa7df9e-dd71-11ee-93b5-52540070c99e",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      }
    ]
  cluster-mapping.json: |-
    [          
      {       
        "clusterIDMapping": {                          
          "2459c716-dd81-11ee-a184-525400150bec": "9fa7df9e-dd71-11ee-93b5-52540070c99e"                                                                                                                                                                                                                                                       
        },                                                                                                                                                                                                                                                                                                                                     
        "RBDPoolIDMapping": [{             
          "3": "5"                            
        }]              
      }                                                   
    ]
metadata:
  name: ceph-csi-config

Actual results

PV and PVC are in Bound state, but application Pod is stucked in ContainerCreation and csi pods show following errors:

Warning  FailedMount  4m29s (x39 over 3h55m)  kubelet  MountVolume.MountDevice failed for volume "pvc-9978d8bc-9053-4cde-bf11-baba5f2df774" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a already exists

And I can see also error log in csi-rbdplugin pod where I can see that it tries to connect to primary ceph which is down:

error generating volume 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a: failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out

Expected behavior

Remount will be done successfully from secondary cluster and application will start.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 14, 2024

error generating volume 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a: failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out

@kmadac is 2459c716-dd81-11ee-a184-525400150bec in the configmap is pointing to the new monitor details in the cluster you are failing over to? if no, you need to do that as well.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 14, 2024

If you are looking for mapping you handle it you can remove the 2459c716-dd81-11ee-a184-525400150bec from the config.json and see if that works as well.

@kmadac
Copy link
Author

kmadac commented Mar 14, 2024

I can confirm that putting secondary mon ip addresses to primary ceph id worked.

Here is final csi config map:

---
apiVersion: v1
kind: ConfigMap
data:
  config.json: |-
    [
      {
        "clusterID": "2459c716-dd81-11ee-a184-525400150bec",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      },
      {
        "clusterID": "9fa7df9e-dd71-11ee-93b5-52540070c99e",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      }
    ]
  cluster-mapping.json: |-
    [          
      {       
        "clusterIDMapping": {                          
          "2459c716-dd81-11ee-a184-525400150bec": "9fa7df9e-dd71-11ee-93b5-52540070c99e"                                                                                                                                                                                                                                                       
        },                                                                                                                                                                                                                                                                                                                                     
        "RBDPoolIDMapping": [{             
          "3": "5"                            
        }]              
      }                                                   
    ]
metadata:
  name: ceph-csi-config

Thank you very much, I'm closing the issue.
Maybe just a question. Is it documented somewhere. I read the documentation, but maybe I missed it.

@kmadac kmadac closed this as completed Mar 14, 2024
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 15, 2024

We might have missed adding it to the document, please feel free to open PR to add missing details. Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants