Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemonset crashloopback in openshift #404

Open
itmwiw opened this issue Apr 11, 2023 · 9 comments
Open

Daemonset crashloopback in openshift #404

itmwiw opened this issue Apr 11, 2023 · 9 comments
Labels

Comments

@itmwiw
Copy link

itmwiw commented Apr 11, 2023

Hello,
I have an Openshift Cluster and I try to use hetznercloud csi-drive. However, all daemonset's pods are in CrashLoopBackOff state. Here's the logs:

[pod/hcloud-csi-node-45xqq/hcloud-csi-driver] level=error ts=2023-04-11T14:33:12.085976239Z msg="failed to fetch server ID from metadata service" err="Get \"http://169.254.169.254/hetzner/v1/metadata/instance-id\": dial tcp 169.254.169.254:80: connect: connection refused"

I guess this is related to what is described in here #143.
This issue was closed because version 1.6.0 attempts to use the environment variable HCLOUD_SERVER_ID or KUBE_NODE_NAME with a call to HCloudClient before falling back to the MetadataClient.
However v2.2.0 doesn't do that anymore, so I guess the issue is back.
Can you help me on this?
Regards,
Tarik

@apricote
Copy link
Member

Hey, this was changed in #269, so we can remove access to the Hetzner Cloud API from the daemon set. We would prefer to keep the daemon set ("node" binary) as small as possible, so adding back access to the API is not what we want.

@samcday Do you have an idea how we can solve this for OpenShift where access to the metadata service is blocked?

@apricote
Copy link
Member

Oh, forgot to mention. The Server ID and Location, which are the two fields retrieved from the Metadata Service are used in the response to NodeGetInfo:

csi-driver/driver/node.go

Lines 194 to 205 in cbb7750

func (s *NodeService) NodeGetInfo(context.Context, *proto.NodeGetInfoRequest) (*proto.NodeGetInfoResponse, error) {
resp := &proto.NodeGetInfoResponse{
NodeId: s.serverID,
MaxVolumesPerNode: MaxVolumesPerNode,
AccessibleTopology: &proto.Topology{
Segments: map[string]string{
TopologySegmentLocation: s.serverLocation,
},
},
}
return resp, nil
}

@samcday
Copy link
Contributor

samcday commented Apr 12, 2023

Hm. Tricky one. My original hope was to use k8s Node metadata as source of truth for this, thus tying csi-driver to hccm. But of course that violates the CSI abstraction and won't work for other container orchestrators.

Ultimately, the only way for us to determine this information from a particular node, without assuming any access to a control plane / orchestrator API of any kind, means we can only fetch this information from the metadata service, or fallback to statically provided information.

... Or we just add back the HCLOUD_TOKEN requirement for the node binary, so that it can fetch this info from the API. That would be a bummer from a purist technical point of view, but maybe it's the only way we can keep the CSI driver running reliably (and reasonably ergonomically!) across multiple orchestrators.

@samcday
Copy link
Contributor

samcday commented Apr 12, 2023

One other somewhat hacky idea: we could do the metadata API lookup in a small initContainer that uses hostNetwork: true and then pass that information along to the main (not host-networking) process.

@apricote
Copy link
Member

One other somewhat hacky idea: we could do the metadata API lookup in a small initContainer that uses hostNetwork: true and then pass that information along to the main (not host-networking) process.

Perhaps this is something that can be done only for Openshift through the Helm Chart?

@samcday
Copy link
Contributor

samcday commented Apr 12, 2023

Perhaps this is something that can be done only for Openshift through the Helm Chart?

Yes, that sounds good 👍 Or even more generally: just a thing that you can opt into through values.yaml: helm install csi-driver --set initMetadataLookup=true or somesuch.


That said, it might just be better to always do it that way and keep the number of different deployment modes to a minimum. With such an approach, the node binary could remove all notion of HC API or metadata service, and require that all necessary metadata/topology info is injected through env. Some of this env comes from downward API, the rest comes from this proposed init container.

@alrf
Copy link

alrf commented Apr 19, 2023

I have the same issue in Openshift.

@alrf
Copy link

alrf commented May 5, 2023

I solved it in v2.3.2 using Topology=false here:

- --feature-gates=Topology=true

and added hostNetwork: true in DaemonSet on line 298:

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the Stale label Aug 4, 2023
@jooola jooola added pinned and removed Stale labels Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants